History log of /linux-master/drivers/infiniband/core/cma.c
Revision Date Author Comments
# e0fe97ef 26-Sep-2023 Mark Zhang <markzhang@nvidia.com>

RDMA/cma: Initialize ib_sa_multicast structure to 0 when join

Initialize the structure to 0 so that it's fields won't have random
values. For example fields like rec.traffic_class (as well as
rec.flow_label and rec.sl) is used to generate the user AH through:
cma_iboe_join_multicast
cma_make_mc_event
ib_init_ah_from_mcmember

And a random traffic_class causes a random IP DSCP in RoCEv2.

Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/20230927090511.603595-1-markzhang@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# f8ef1be8 17-Jul-2023 Chuck Lever <chuck.lever@oracle.com>

RDMA/cma: Avoid GID lookups on iWARP devices

We would like to enable the use of siw on top of a VPN that is
constructed and managed via a tun device. That hasn't worked up
until now because ARPHRD_NONE devices (such as tun devices) have
no GID for the RDMA/core to look up.

But it turns out that the egress device has already been picked for
us -- no GID is necessary. addr_handler() just has to do the right
thing with it.

Link: https://lore.kernel.org/r/168960675257.3007.4737911174148394395.stgit@manet.1015granger.net
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 700c9649 17-Jul-2023 Chuck Lever <chuck.lever@oracle.com>

RDMA/cma: Deduplicate error flow in cma_validate_port()

Clean up to prepare for the addition of new logic.

Link: https://lore.kernel.org/r/168960674597.3007.6128252077812202526.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 0e158630 12-Jul-2023 Shiraz Saleem <shiraz.saleem@intel.com>

RDMA/core: Update CMA destination address on rdma_resolve_addr

8d037973d48c ("RDMA/core: Refactor rdma_bind_addr") intoduces as regression
on irdma devices on certain tests which uses rdma CM, such as cmtime.

No connections can be established with the MAD QP experiences a fatal
error on the active side.

The cma destination address is not updated with the dst_addr when ULP
on active side calls rdma_bind_addr followed by rdma_resolve_addr.
The id_priv state is 'bound' in resolve_prepare_src and update is skipped.

This leaves the dgid passed into irdma driver to create an Address Handle
(AH) for the MAD QP at 0. The create AH descriptor as well as the ARP cache
entry is invalid and HW throws an asynchronous events as result.

[ 1207.656888] resolve_prepare_src caller: ucma_resolve_addr+0xff/0x170 [rdma_ucm] daddr=200.0.4.28 id_priv->state=7
[....]
[ 1207.680362] ice 0000:07:00.1 rocep7s0f1: caller: irdma_create_ah+0x3e/0x70 [irdma] ah_id=0 arp_idx=0 dest_ip=0.0.0.0
destMAC=00:00:64:ca:b7:52 ipvalid=1 raw=0000:0000:0000:0000:0000:ffff:0000:0000
[ 1207.682077] ice 0000:07:00.1 rocep7s0f1: abnormal ae_id = 0x401 bool qp=1 qp_id = 1, ae_src=5
[ 1207.691657] infiniband rocep7s0f1: Fatal error (1) on MAD QP (1)

Fix this by updating the CMA destination address when the ULP calls
a resolve address with the CM state already bound.

Fixes: 8d037973d48c ("RDMA/core: Refactor rdma_bind_addr")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230712234133.1343-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 6735041f 13-Jun-2023 Yang Li <yang.lee@linux.alibaba.com>

RDMA/cma: Remove NULL check before dev_{put, hold}

The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there
is no need to check before using dev_{put, hold}, remove it to silence the
warning:

./drivers/infiniband/core/cma.c:4812:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.

Link: https://lore.kernel.org/r/20230614014328.14007-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5521
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 58030c76 05-Jun-2023 Mark Zhang <markzhang@nvidia.com>

RDMA/cma: Always set static rate to 0 for RoCE

Set static rate to 0 as it should be discovered by path query and
has no meaning for RoCE.
This also avoid of using the rtnl lock and ethtool API, which is
a bottleneck when try to setup many rdma-cm connections at the same
time, especially with multiple processes.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/f72a4f8b667b803aee9fa794069f61afb5839ce4.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 08ebf57f 30-Mar-2023 Yang Li <yang.lee@linux.alibaba.com>

RDMA/cma: Remove NULL check before dev_{put, hold}

The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warnings:

./drivers/infiniband/core/cma.c:713:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
./drivers/infiniband/core/cma.c:2433:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4668
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230331010633.63261-1-yang.lee@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 58e84f6b 19-Mar-2023 Mark Zhang <markzhang@nvidia.com>

RDMA/cma: Allow UD qp_type to join multicast only

As for multicast:
- The SIDR is the only mode that makes sense;
- Besides PS_UDP, other port spaces like PS_IB is also allowed, as it is
UD compatible. In this case qkey also needs to be set [1].

This patch allows only UD qp_type to join multicast, and set qkey to
default if it's not set, to fix an uninit-value error: the ib->rec.qkey
field is accessed without being initialized.

=====================================================
BUG: KMSAN: uninit-value in cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
BUG: KMSAN: uninit-value in cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
cma_iboe_join_multicast drivers/infiniband/core/cma.c:4782 [inline]
rdma_join_multicast+0x2b83/0x30a0 drivers/infiniband/core/cma.c:4814
ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479
ucma_join_multicast+0x1e3/0x250 drivers/infiniband/core/ucma.c:1546
ucma_write+0x639/0x6d0 drivers/infiniband/core/ucma.c:1732
vfs_write+0x8ce/0x2030 fs/read_write.c:588
ksys_write+0x28c/0x520 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:652
do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
__do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Local variable ib.i created at:
cma_iboe_join_multicast drivers/infiniband/core/cma.c:4737 [inline]
rdma_join_multicast+0x586/0x30a0 drivers/infiniband/core/cma.c:4814
ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479

CPU: 0 PID: 29874 Comm: syz-executor.3 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================

[1] https://lore.kernel.org/linux-rdma/20220117183832.GD84788@nvidia.com/

Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Reported-by: syzbot+8fcbb77276d43cc8b693@syzkaller.appspotmail.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/58a4a98323b5e6b1282e83f6b76960d06e43b9fa.1679309909.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 876e480d 08-Feb-2023 Kees Cook <keescook@chromium.org>

RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size

Clang can do some aggressive inlining, which provides it with greater
visibility into the sizes of various objects that are passed into
helpers. Specifically, compare_netdev_and_ip() can see through the type
given to the "sa" argument, which means it can generate code for "struct
sockaddr_in" that would have been passed to ipv6_addr_cmp() (that expects
to operate on the larger "struct sockaddr_in6"), which would result in a
compile-time buffer overflow condition detected by memcmp(). Logically,
this state isn't reachable due to the sa_family assignment two callers
above and the check in compare_netdev_and_ip(). Instead, provide a
compile-time check on sizes so the size-mismatched code will be elided
when inlining. Avoids the following warning from Clang:

../include/linux/fortify-string.h:652:4: error: call to '__read_overflow' declared with 'error' attribute: detected read beyond size of object (1st parameter)
__read_overflow();
^
note: In function 'cma_netevent_callback'
note: which inlined function 'node_from_ndev_ip'
1 error generated.

When the underlying object size is not known (e.g. with GCC and older
Clang), the result of __builtin_object_size() is SIZE_MAX, which will also
compile away, leaving the code as it was originally.

Link: https://lore.kernel.org/r/20230208232549.never.139-kees@kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1687
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# ccae0447 04-Jan-2023 Mark Zhang <markzhang@nvidia.com>

RDMA/cma: Refactor the inbound/outbound path records process flow

Refactors based on comments [1] of the multiple path records support
patchset:
- Return failure if not able to set inbound/outbound PRs;
- Simplify the flow when receiving the PRs from netlink channel: When
a good PR response is received, unpack it and call the path_query
callback directly. This saves two memory allocations;
- Define RDMA_PRIMARY_PATH_MAX_REC_NUM in a proper place.

[1] https://lore.kernel.org/linux-rdma/Yyxp9E9pJtUids2o@nvidia.com/

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org> #srp
Link: https://lore.kernel.org/r/7610025d57342b8b6da0f19516c9612f9c3fdc37.1672819376.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 8d037973 04-Jan-2023 Patrisious Haddad <phaddad@nvidia.com>

RDMA/core: Refactor rdma_bind_addr

Refactor rdma_bind_addr function so that it doesn't require that the
cma destination address be changed before calling it.

So now it will update the destination address internally only when it is
really needed and after passing all the required checks.

Which in turn results in a cleaner and more sensible call and error
handling flows for the functions that call it directly or indirectly.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/3d0e9a2fd62bc10ba02fed1c7c48a48638952320.1672819273.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# fb4907f4 24-Nov-2022 Chao Leng <lengchao@huawei.com>

RDMA/cma: Change RoCE packet life time from 18 to 16

The ack timeout retransmission time is affected by the following two
factors: one is packet life time, another is the HCA processing time.

Now the default packet lifetime(CMA_IBOE_PACKET_LIFETIME) is 18.

That means the minimum ack timeout is 2
seconds (2^(18+1)*4us=2.097seconds). The packet lifetime means the
maximum transmission time of packets on the network, 2 seconds is too
long.

Assume the network is a clos topology with three layers, every packet will
pass through five hops of switches. Assume the buffer of every switch is
128MB and the port transmission rate is 25 Gbit/s, the maximum
transmission time of the packet is 200ms(128MB*5/25Gbit/s). Add double
redundancy, it is less than 500ms.

So change the CMA_IBOE_PACKET_LIFETIME to 16, the maximum transmission
time of the packet will be about 500+ms, it is long enough.

Link: https://lore.kernel.org/r/20221125010026.755-1-lengchao@huawei.com
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# e8a533cb 09-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32_inclusive() when possible

These cases were done with this Coccinelle:

@@
expression H;
expression L;
@@
- (get_random_u32_below(H) + L)
+ get_random_u32_inclusive(L, H + L - 1)

@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- + E
- - E
)

@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- - E
- + E
)

@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- - E
+ F
- + E
)

@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- + E
+ F
- - E
)

And then subsequently cleaned up by hand, with several automatic cases
rejected if it didn't make sense contextually.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 8032bf12 09-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# eb83f502 12-Oct-2022 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Use output interface for net_dev check

Commit 27cfde795a96 ("RDMA/cma: Fix arguments order in net device
validation") swapped the src and dst addresses in the call to
validate_net_dev().

As a consequence, the test in validate_ipv4_net_dev() to see if the
net_dev is the right one, is incorrect for port 1 <-> 2 communication when
the ports are on the same sub-net. This is fixed by denoting the
flowi4_oif as the device instead of the incoming one.

The bug has not been observed using IPv6 addresses.

Fixes: 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Link: https://lore.kernel.org/r/20221012141542.16925-1-haakon.bugge@oracle.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 81895a65 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
- T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# eb8336db 08-Sep-2022 Mark Zhang <markzhang@nvidia.com>

RDMA/cm: Use DLID from inbound/outbound PathRecords as the datapath DLID

In inter-subnet cases, when inbound/outbound PRs are available,
outbound_PR.dlid is used as the requestor's datapath DLID and
inbound_PR.dlid is used as the responder's DLID. The inbound_PR.dlid
is passed to responder side with the "ConnectReq.Primary_Local_Port_LID"
field. With this solution the PERMISSIVE_LID is no longer used in
Primary Local LID field.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/b3f6cac685bce9dde37c610be82e2c19d9e51d9e.1662631201.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 5a374949 08-Sep-2022 Mark Zhang <markzhang@nvidia.com>

RDMA/cma: Multiple path records support with netlink channel

Support receiving inbound and outbound IB path records (along with GMP
PathRecord) from user-space service through the RDMA netlink channel.
The LIDs in these 3 PRs can be used in this way:
1. GMP PR: used as the standard local/remote LIDs;
2. DLID of outbound PR: Used as the "dlid" field for outbound traffic;
3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in
responder side.

This is aimed to support adaptive routing. With current IB routing
solution when a packet goes out it's assigned with a fixed DLID per
target, meaning a fixed router will be used.
The LIDs in inbound/outbound path records can be used to identify group
of routers that allow communication with another subnet's entity. With
them packets from an inter-subnet connection may travel through any
router in the set to reach the target.

As confirmed with Jason, when sending a netlink request, kernel uses
LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports
multiple PRs.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# bf9a9928 08-Sep-2022 Mark Zhang <markzhang@nvidia.com>

RDMA/core: Rename rdma_route.num_paths field to num_pri_alt_paths

This fields means the total number of primary and alternative paths,
i.e.,:
0 - No primary nor alternate path is available;
1 - Only primary path is available;
2 - Both primary and alternate path are available.
Rename it to avoid confusion as with follow patches primary path will
support multiple path records.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/cbe424de63a56207870d70c5edce7c68e45f429e.1662631201.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 27cfde79 23-Aug-2022 Michael Guralnik <michaelgur@nvidia.com>

RDMA/cma: Fix arguments order in net device validation

Fix the order of source and destination addresses when resolving the
route between server and client to validate use of correct net device.

The reverse order we had so far didn't actually validate the net device
as the server would try to resolve the route to itself, thus always
getting the server's net device.

The issue was discovered when running cm applications on a single host
between 2 interfaces with same subnet and source based routing rules.
When resolving the reverse route the source based route rules were
ignored.

Fixes: f887f2ac87c2 ("IB/cma: Validate routing of incoming requests")
Link: https://lore.kernel.org/r/1c1ec2277a131d277ebcceec987fd338d35b775f.1661251872.git.leonro@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 925d046e 07-Jun-2022 Patrisious Haddad <phaddad@nvidia.com>

RDMA/core: Add a netevent notifier to cma

Add a netevent callback for cma, mainly to catch NETEVENT_NEIGH_UPDATE.

Previously, when a system with failover MAC mechanism change its MAC address
during a CM connection attempt, the RDMA-CM would take a lot of time till
it disconnects and timesout due to the incorrect MAC address.

Now when we get a NETEVENT_NEIGH_UPDATE we check if it is due to a failover
MAC change and if so, we instantly destroy the CM and notify the user in order
to spare the unnecessary waiting for the timeout.

Link: https://lore.kernel.org/r/bb255c9e301cd50b905663b8e73f7f5133d0e4c5.1654601342.git.leonro@nvidia.com
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# fc008bdb 07-Jun-2022 Patrisious Haddad <phaddad@nvidia.com>

RDMA/core: Add an rb_tree that stores cm_ids sorted by ifindex and remote IP

Add to the cma, a tree that keeps track of all rdma_id_private channels
that were created while in RoCE mode.

The IDs are sorted first according to their netdevice ifindex then their
destination IP. And for IDs with matching IP they would be at the same node
in the tree, since the tree data is a list of all ids with matching destination IP.

The tree allows fast and efficient lookup of ids using an ifindex and
IP address which is useful for identifying relevant net_events promptly.

Link: https://lore.kernel.org/r/2fac52c86cc918c634ab24b3867d4aed992f54ec.1654601342.git.leonro@nvidia.com
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 748663c8 09-Feb-2022 Håkon Bugge <haakon.bugge@oracle.com>

IB/cma: Allow XRC INI QPs to set their local ACK timeout

XRC INI QPs should be able to adjust their local ACK timeout.

Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set")
Link: https://lore.kernel.org/r/1644421175-31943-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Suggested-by: Avneesh Pant <avneesh.pant@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 22e9f710 23-Feb-2022 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Do not change route.addr.src_addr outside state checks

If the state is not idle then resolve_prepare_src() should immediately
fail and no change to global state should happen. However, it
unconditionally overwrites the src_addr trying to build a temporary any
address.

For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():

if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)

Which would manifest as this trace from syzkaller:

BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204

CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__list_add_valid+0x93/0xa0 lib/list_debug.c:26
__list_add include/linux/list.h:67 [inline]
list_add_tail include/linux/list.h:100 [inline]
cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xa30 fs/read_write.c:603
ksys_write+0x1ee/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

This is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().

Instead of trying to re-use the src_addr memory to indirectly create an
any address derived from the dst build one explicitly on the stack and
bind to that as any other normal flow would do. rdma_bind_addr() will copy
it over the src_addr once it knows the state is valid.

This is similar to commit bc0bdc5afaa7 ("RDMA/cma: Do not change
route.addr.src_addr.ss_family")

Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# d9e410eb 18-Jan-2022 Maor Gottlieb <maorg@nvidia.com>

RDMA/cma: Use correct address when leaving multicast group

In RoCE we should use cma_iboe_set_mgid() and not cma_set_mgid to generate
the mgid, otherwise we will generate an IGMP for an incorrect address.

Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Link: https://lore.kernel.org/r/913bc6783fd7a95fe71ad9454e01653ee6fb4a9a.1642491047.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 20679094 09-Dec-2021 Avihai Horon <avihaih@nvidia.com>

RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entry

Currently, when cma_resolve_ib_dev() searches for a matching GID it will
stop searching after encountering the first empty GID table entry. This
behavior is wrong since neither IB nor RoCE spec enforce tightly packed
GID tables.

For example, when the matching valid GID entry exists at index N, and if a
GID entry is empty at index N-1, cma_resolve_ib_dev() will fail to find
the matching valid entry.

Fix it by making cma_resolve_ib_dev() continue searching even after
encountering missing entries.

Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Link: https://lore.kernel.org/r/b7346307e3bb396c43d67d924348c6c496493991.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 8d0d2b0f 23-Nov-2021 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Remove open coding of overflow checking for private_data_len

The existing tests are a little hard to comprehend. Use
check_add_overflow() instead.

Fixes: 04ded1672402 ("RDMA/cma: Verify private data length")
Link: https://lore.kernel.org/r/1637661978-18770-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 99cfddb8 15-Sep-2021 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Split apart the multiple uses of the same list heads

Two list heads in the rdma_id_private are being used for multiple
purposes, to save a few bytes of memory. Give the different purposes
different names and union the memory that is clearly exclusive.

list splits into device_item and listen_any_item. device_item is threaded
onto the cma_device's list and listen_any goes onto the
listen_any_list. IDs doing any listen cannot have devices.

listen_list splits into listen_item and listen_list. listen_list is on the
parent listen any rdma_id_private and listen_item is on child listen that
is bound to a specific cma_dev.

Which name should be used in which case depends on the state and other
factors of the rdma_id_private. Remap all the confusing references to make
sense with the new names, so at least there is some hope of matching the
necessary preconditions with each access.

Link: https://lore.kernel.org/r/0-v1-a5ead4a0c19d+c3a-cma_list_head_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 305d568b 16-Sep-2021 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests

The FSM can run in a circle allowing rdma_resolve_ip() to be called twice
on the same id_priv. While this cannot happen without going through the
work, it violates the invariant that the same address resolution
background request cannot be active twice.

CPU 1 CPU 2

rdma_resolve_addr():
RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
rdma_resolve_ip(addr_handler) #1

process_one_req(): for #1
addr_handler():
RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
mutex_unlock(&id_priv->handler_mutex);
[.. handler still running ..]

rdma_resolve_addr():
RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
rdma_resolve_ip(addr_handler)
!! two requests are now on the req_list

rdma_destroy_id():
destroy_id_handler_unlock():
_destroy_id():
cma_cancel_operation():
rdma_addr_cancel()

// process_one_req() self removes it
spin_lock_bh(&lock);
cancel_delayed_work(&req->work);
if (!list_empty(&req->list)) == true

! rdma_addr_cancel() returns after process_on_req #1 is done

kfree(id_priv)

process_one_req(): for #2
addr_handler():
mutex_lock(&id_priv->handler_mutex);
!! Use after free on id_priv

rdma_addr_cancel() expects there to be one req on the list and only
cancels the first one. The self-removal behavior of the work only happens
after the handler has returned. This yields a situations where the
req_list can have two reqs for the same "handle" but rdma_addr_cancel()
only cancels the first one.

The second req remains active beyond rdma_destroy_id() and will
use-after-free id_priv once it inevitably triggers.

Fix this by remembering if the id_priv has called rdma_resolve_ip() and
always cancel before calling it again. This ensures the req_list never
gets more than one item in it and doesn't cost anything in the normal flow
that never uses this strange error path.

Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
Reported-by: syzbot+dc3dfba010d7671e05f5@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# bc0bdc5a 15-Sep-2021 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Do not change route.addr.src_addr.ss_family

If the state is not idle then rdma_bind_addr() will immediately fail and
no change to global state should happen.

For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():

if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)

To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4
family, failing the test.

This would manifest as this trace from syzkaller:

BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204

CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__list_add_valid+0x93/0xa0 lib/list_debug.c:26
__list_add include/linux/list.h:67 [inline]
list_add_tail include/linux/list.h:100 [inline]
cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xa30 fs/read_write.c:603
ksys_write+0x1ee/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Which is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().

Instead of trying to re-use the src_addr memory to indirectly create an
any address build one explicitly on the stack and bind to that as any
other normal flow would do.

Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com
Tested-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# ca465e1f 13-Sep-2021 Tao Liu <thomas.liu@ucloud.cn>

RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure

If cma_listen_on_all() fails it leaves the per-device ID still on the
listen_list but the state is not set to RDMA_CM_ADDR_BOUND.

When the cmid is eventually destroyed cma_cancel_listens() is not called
due to the wrong state, however the per-device IDs are still holding the
refcount preventing the ID from being destroyed, thus deadlocking:

task:rping state:D stack: 0 pid:19605 ppid: 47036 flags:0x00000084
Call Trace:
__schedule+0x29a/0x780
? free_unref_page_commit+0x9b/0x110
schedule+0x3c/0xa0
schedule_timeout+0x215/0x2b0
? __flush_work+0x19e/0x1e0
wait_for_completion+0x8d/0xf0
_destroy_id+0x144/0x210 [rdma_cm]
ucma_close_id+0x2b/0x40 [rdma_ucm]
__destroy_id+0x93/0x2c0 [rdma_ucm]
? __xa_erase+0x4a/0xa0
ucma_destroy_id+0x9a/0x120 [rdma_ucm]
ucma_write+0xb8/0x130 [rdma_ucm]
vfs_write+0xb4/0x250
ksys_write+0xb5/0xd0
? syscall_trace_enter.isra.19+0x123/0x190
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Ensure that cma_listen_on_all() atomically unwinds its action under the
lock during error.

Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cn
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 2cc74e1e 08-Sep-2021 Christoph Lameter <cl@gentwo.de>

IB/cma: Do not send IGMP leaves for sendonly Multicast groups

ROCE uses IGMP for Multicast instead of the native Infiniband system where
joins are required in order to post messages on the Multicast group. On
Ethernet one can send Multicast messages to arbitrary addresses without
the need to subscribe to a group.

So ROCE correctly does not send IGMP joins during rdma_join_multicast().

F.e. in cma_iboe_join_multicast() we see:

if (addr->sa_family == AF_INET) {
if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
if (!send_only) {
err = cma_igmp_send(ndev, &ib.rec.mgid,
true);
}
}
} else {

So the IGMP join is suppressed as it is unnecessary.

However no such check is done in destroy_mc(). And therefore leaving a
sendonly multicast group will send an IGMP leave.

This means that the following scenario can lead to a multicast receiver
unexpectedly being unsubscribed from a MC group:

1. Sender thread does a sendonly join on MC group X. No IGMP join
is sent.

2. Receiver thread does a regular join on the same MC Group x.
IGMP join is sent and the receiver begins to get messages.

3. Sender thread terminates and destroys MC group X.
IGMP leave is sent and the receiver no longer receives data.

This patch adds the same logic for sendonly joins to destroy_mc() that is
also used in cma_iboe_join_multicast().

Fixes: ab15c95a17b3 ("IB/core: Support for CMA multicast join flags")
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.de
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 5f5a6509 12-Aug-2021 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/core/sa_query: Retry SA queries

A MAD packet is sent as an unreliable datagram (UD). SA requests are sent
as MAD packets. As such, SA requests or responses may be silently dropped.

IB Core's MAD layer has a timeout and retry mechanism, which amongst
other, is used by RDMA CM. But it is not used by SA queries. The lack of
retries of SA queries leads to long specified timeout, and error being
returned in case of packet loss. The ULP or user-land process has to
perform the retry.

Fix this by taking advantage of the MAD layer's retry mechanism.

First, a check against a zero timeout is added in rdma_resolve_route(). In
send_mad(), we set the MAD layer timeout to one tenth of the specified
timeout and the number of retries to 10. The special case when timeout is
less than 10 is handled.

With this fix:

# ucmatose -c 1000 -S 1024 -C 1

runs stable on an Infiniband fabric. Without this fix, we see an
intermittent behavior and it errors out with:

cmatose: event: RDMA_CM_EVENT_ROUTE_ERROR, error: -110

(110 is ETIMEDOUT)

Link: https://lore.kernel.org/r/1628784755-28316-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# db4657af 29-Jul-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

RDMA/cma: Revert INIT-INIT patch

The net/sunrpc/xprtrdma module creates its QP using rdma_create_qp() and
immediately post receives, implicitly assuming the QP is in the INIT state
and thus valid for ib_post_recv().

The patch noted in Fixes: removed the RESET->INIT modifiy from
rdma_create_qp(), breaking NFS rdma for verbs providers that fail the
ib_post_recv() for a bad state.

This situation was proven using kprobes in rvt_post_recv() and
rvt_modify_qp(). The traces showed that the rvt_post_recv() failed before
ANY modify QP and that the current state was RESET.

Fix by reverting the patch below.

Fixes: dc70f7c3ed34 ("RDMA/cma: Remove unnecessary INIT->INIT transition")
Link: https://lore.kernel.org/r/1627583182-81330-1-git-send-email-mike.marciniszyn@cornelisnetworks.com
Cc: Haakon Bugge <haakon.bugge@oracle.com>
Cc: Chuck Lever III <chuck.lever@oracle.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 3d828754 29-Jun-2021 Leon Romanovsky <leon@kernel.org>

RDMA/core: Always release restrack object

Change location of rdma_restrack_del() to fix the bug where
task_struct was acquired but not released, causing to resource leak.

ucma_create_id() {
ucma_alloc_ctx();
rdma_create_user_id() {
rdma_restrack_new();
rdma_restrack_set_name() {
rdma_restrack_attach_task.part.0(); <--- task_struct was gotten
}
}
ucma_destroy_private_ctx() {
ucma_put_ctx();
rdma_destroy_id() {
_destroy_id() <--- id_priv was freed
}
}
}

Fixes: 889d916b6f8a ("RDMA/core: Don't access cm_id after its destruction")
Link: https://lore.kernel.org/r/073ec27acb943ca8b6961663c47c5abe78a5c8cc.1624948948.git.leonro@nvidia.com
Reported-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 74f160ea 24-Jun-2021 Gerd Rausch <gerd.rausch@oracle.com>

RDMA/cma: Fix rdma_resolve_route() memory leak

Fix a memory leak when "mda_resolve_route() is called more than once on
the same "rdma_cm_id".

This is possible if cma_query_handler() triggers the
RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and
allows rdma_resolve_route() to be called again.

Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# e84045ea 22-Jun-2021 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Fix incorrect Packet Lifetime calculation

An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.

If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.

Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).

Fixed by explicitly testing for timeout being zero.

Fixes: e1ee1e62bec4 ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# ca0c448d 22-Jun-2021 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Protect RMW with qp_mutex

The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.

Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.

The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.

This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.

Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# dc70f7c3 22-Jun-2021 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Remove unnecessary INIT->INIT transition

In rdma_create_qp(), a connected QP will be transitioned to the INIT
state.

Afterwards, the QP will be transitioned to the RTR state by the
cma_modify_qp_rtr() function. But this function starts by performing an
ib_modify_qp() to the INIT state again, before another ib_modify_qp() is
performed to transition the QP to the RTR state.

Hence, there is no need to transition the QP to the INIT state in
rdma_create_qp().

Link: https://lore.kernel.org/r/1624369197-24578-2-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 889d916b 10-May-2021 Shay Drory <shayd@nvidia.com>

RDMA/core: Don't access cm_id after its destruction

restrack should only be attached to a cm_id while the ID has a valid
device pointer. It is set up when the device is first loaded, but not
cleared when the device is removed. There is also two copies of the device
pointer, one private and one in the public API, and these were left out of
sync.

Make everything go to NULL together and manipulate restrack right around
the device assignments.

Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334

CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:120
__kasan_report mm/kasan/report.c:400 [inline]
kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
__list_del include/linux/list.h:112 [inline]
__list_del_entry include/linux/list.h:135 [inline]
list_del include/linux/list.h:146 [inline]
cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
_destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
__fput+0x169/0x540 fs/file_table.c:280
task_work_run+0xb7/0x100 kernel/task_work.c:140
exit_task_work include/linux/task_work.h:30 [inline]
do_exit+0x7da/0x17f0 kernel/exit.c:825
do_group_exit+0x9e/0x190 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Link: https://lore.kernel.org/r/3352ee288fe34f2b44220457a29bfc0548686363.1620711734.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# cb5cd0ea 18-Apr-2021 Shay Drory <shayd@nvidia.com>

RDMA/core: Add CM to restrack after successful attachment to a device

The device attach triggers addition of CM_ID to the restrack DB.
However, when error occurs, we releasing this device, but defer CM_ID
release. This causes to the situation where restrack sees CM_ID that
is not valid anymore.

As a solution, add the CM_ID to the resource tracking DB only after the
attachment is finished.

Found by syzcaller:
infiniband syz0: added syz_tun
rdma_rxe: ignoring netdev event = 10 for syz_tun
infiniband syz0: set down
infiniband syz0: ib_query_port failed (-19)
restrack: ------------[ cut here ]------------
infiniband syz0: BUG: RESTRACK detected leak of resources
restrack: User CM_ID object allocated by syz-executor716 is not freed
restrack: ------------[ cut here ]------------

Fixes: b09c4d701220 ("RDMA/restrack: Improve readability in task name management")
Link: https://lore.kernel.org/r/ab93e56ba831eac65c322b3256796fa1589ec0bb.1618753862.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 4d51c3d9 18-Apr-2021 Parav Pandit <parav@nvidia.com>

RDMA/cma: Skip device which doesn't support CM

A switchdev RDMA device do not support IB CM. When such device is added to
the RDMA CM's device list, when application invokes rdma_listen(), cma
attempts to listen to such device, however it has IB CM attribute
disabled.

Due to this, rdma_listen() call fails to listen for other non switchdev
devices as well.

A below error message can be seen.

infiniband mlx5_0: RDMA CMA: cma_listen_on_dev, error -38

A failing call flow is below.

cma_listen_on_all()
cma_listen_on_dev()
_cma_attach_to_dev()
rdma_listen() <- fails on a specific switchdev device

This is because rdma_listen() is hardwired to only work with iwarp or IB
CM compatible devices.

Hence, when a IB device doesn't support IB CM or IW CM, avoid adding such
device to the cma list so rdma_listen() can't even be called.

Link: https://lore.kernel.org/r/f9cac00d52864ea7c61295e43fb64cf4db4fdae6.1618753862.git.leonro@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 3aeffc46 31-Mar-2021 Håkon Bugge <haakon.bugge@oracle.com>

IB/cma: Introduce rdma_set_min_rnr_timer()

Introduce the ability for kernel ULPs to adjust the minimum RNR Retry
timer. The INIT -> RTR transition executed by RDMA CM will be used for
this adjustment. This avoids an additional ib_modify_qp() call.

rdma_set_min_rnr_timer() must be called before the call to rdma_connect()
on the active side and before the call to rdma_accept() on the passive
side.

The default value of RNR Retry timer is zero, which translates to 655
ms. When the receiver is not ready to accept a send messages, it encodes
the RNR Retry timer value in the NAK. The requestor will then wait at
least the specified time value before retrying the send.

The 5-bit value to be supplied to the rdma_set_min_rnr_timer() is
documented in IBTA Table 45: "Encoding for RNR NAK Timer Field".

Link: https://lore.kernel.org/r/1617216194-12890-2-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b6eb7011 07-Apr-2021 Wenpeng Liang <liangwenpeng@huawei.com>

RDMA/core: Correct format of braces

Do following cleanups about braces:

- Add the necessary braces to maintain context alignment.
- Fix the open '{' that is not on the same line as "switch".
- Remove braces that are not necessary for single statement blocks.
- Fix "else" that doesn't follow close brace '}'.

Link: https://lore.kernel.org/r/1617783353-48249-6-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 1fb7f897 01-Mar-2021 Mark Bloch <mbloch@nvidia.com>

RDMA: Support more than 255 rdma ports

Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 87115951 14-Mar-2021 Gal Pressman <galpress@amazon.com>

RDMA/cma: Remove unused leftovers in cma code

Commit ee1c60b1bff8 ("IB/SA: Modify SA to implicitly cache Class Port
info") removed the class_port_info_context struct usage, remove a couple
of leftovers.

Link: https://lore.kernel.org/r/20210314143427.76101-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# e35ecb46 19-Feb-2021 Bernard Metzler <bmt@zurich.ibm.com>

RDMA/iwcm: Allow AFONLY binding for IPv6 addresses

Binding IPv6 address/port to AF_INET6 domain only is provided via
rdma_set_afonly(), but was not signalled to the provider. Applications
like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses
simultaneously and thus rely on it working correctly.

Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.com
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# fe454dc3 11-Feb-2021 Avihai Horon <avihaih@nvidia.com>

RDMA/ucma: Fix use-after-free bug in ucma_create_uevent

ucma_process_join() allocates struct ucma_multicast mc and frees it if an
error occurs during its run. Specifically, if an error occurs in
copy_to_user(), a use-after-free might happen in the following scenario:

1. mc struct is allocated.
2. rdma_join_multicast() is called and succeeds. During its run,
cma_iboe_join_multicast() enqueues a work that will later use the
aforementioned mc struct.
3. copy_to_user() is called and fails.
4. mc struct is deallocated.
5. The work that was enqueued by cma_iboe_join_multicast() is run and
calls ucma_create_uevent() which tries to access mc struct (which is
freed by now).

Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast().
Since cma_work_handler() frees struct cma_work, we don't use it in
cma_iboe_join_multicast() so we can safely cancel the work later.

The following syzkaller report revealed it:

BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272
Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108

CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: rdma_cm cma_work_handler
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:118
print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385
__kasan_report mm/kasan/report.c:545 [inline]
kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562
ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272
ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349
cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977
cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718
process_one_work+0x54c/0×930 kernel/workqueue.c:2272
worker_thread+0x82/0×830 kernel/workqueue.c:2418
kthread+0x1ca/0×220 kernel/kthread.c:292
ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296

Allocated by task 359:
kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:461 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:664 [inline]
ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453
ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
vfs_write fs/read_write.c:603 [inline]
vfs_write+0x191/0×4c0 fs/read_write.c:585
ksys_write+0x1a1/0×1e0 fs/read_write.c:658
do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 359:
kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
kasan_set_track+0x1c/0×30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355
__kasan_slab_free+0x112/0×160 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook mm/slub.c:1577 [inline]
slab_free mm/slub.c:3142 [inline]
kfree+0xb3/0×3e0 mm/slub.c:4124
ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497
ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
vfs_write fs/read_write.c:603 [inline]
vfs_write+0x191/0×4c0 fs/read_write.c:585
ksys_write+0x1a1/0×1e0 fs/read_write.c:658
do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88810b3ad100
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 16 bytes inside of
192-byte region [ffff88810b3ad100, ffff88810b3ad1c0)

Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.org
Reported-by: Amit Matityahu <mitm@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 633d6102 28-Jan-2021 Christoph Lameter <cl@linux.com>

RDMA/ipoib: Remove racy Subnet Manager sendonly join checks

When a system receives a REREG event from the SM, then the SM information
in the kernel is marked as invalid and a request is sent to the SM to
update the information. The SM information is invalid in that time period.

However, receiving a REREG also occurs simultaneously in user space
applications that are now trying to rejoin the multicast groups. Some of
those may be sendonly multicast groups which are then failing.

If the SM information is invalid then ib_sa_sendonly_fullmem_support()
returns false. That is wrong because it just means that we do not know yet
if the potentially new SM supports sendonly joins.

Sendonly join was introduced in 2015 and all the Subnet managers have
supported it ever since. So there is no point in checking if a subnet
manager supports it.

Should an old opensm get a request for a sendonly join then the request
will fail. The code that is removed here accomodated that situation and
fell back to a full join.

Falling back to a full join is problematic in itself. The reason to use
the sendonly join was to reduce the traffic on the Infiniband fabric
otherwise one could have just stayed with the regular join. So this patch
may cause users of very old opensms to discover that lots of traffic
needlessly crosses their IB fabrics.

Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# e246b7c0 13-Dec-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Don't overwrite sgid_attr after device is released

As part of the cma_dev release, that pointer will be set to NULL. In case
it happens in rdma_bind_addr() (part of an error flow), the next call to
addr_handler() will have a call to cma_acquire_dev_by_src_ip() which will
overwrite sgid_attr without releasing it.

WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649
CPU: 2 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: ib_addr process_one_req
RIP: 0010:cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
RIP: 0010:cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649
Code: 66 d9 4a ff 4d 8b 6e 10 49 8d bd 1c 08 00 00 e8 b6 d6 4a ff 45 0f b6 bd 1c 08 00 00 41 83 e7 01 e9 49 fd ff ff e8 90 c5 29 ff <0f> 0b e9 80 fe ff ff e8 84 c5 29 ff 4c 89 f7 e8 2c d9 4a ff 4d 8b
RSP: 0018:ffff8881047c7b40 EFLAGS: 00010293
RAX: ffff888104789c80 RBX: 0000000000000001 RCX: ffffffff820b8ef8
RDX: 0000000000000000 RSI: ffffffff820b9080 RDI: ffff88810cd4c998
RBP: ffff8881047c7c08 R08: ffff888104789c80 R09: ffffed10209f4036
R10: ffff888104fa01ab R11: ffffed10209f4035 R12: ffff88810cd4c800
R13: ffff888105750e28 R14: ffff888108f0a100 R15: ffff88810cd4c998
FS: 0000000000000000(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000104e60005 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
addr_handler+0x266/0x350 drivers/infiniband/core/cma.c:3190
process_one_req+0xa3/0x300 drivers/infiniband/core/addr.c:645
process_one_work+0x54c/0x930 kernel/workqueue.c:2272
worker_thread+0x82/0x830 kernel/workqueue.c:2418
kthread+0x1ca/0x220 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Fixes: ff11c6cd521f ("RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()")
Link: https://lore.kernel.org/r/20201213132940.345554-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# dd37d2f5 18-Nov-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind

rdma_detroy_id() cannot be called under &lock - we must instead keep the
error'd ID around until &lock can be released, then destroy it.

This is complicated by the usual way listen IDs are destroyed through
cma_process_remove() which can run at any time and will asynchronously
destroy the same ID.

Remove the ID from visiblity of cma_process_remove() before going down the
destroy path outside the locking.

Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca
Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# c80a0c52 04-Nov-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Add missing error handling of listen_id

Don't silently continue if rdma_listen() fails but destroy previously
created CM_ID and return an error to the caller.

Fixes: d02d1f5359e7 ("RDMA/cma: Fix deadlock destroying listen requests")
Link: https://lore.kernel.org/r/20201104144008.3808124-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 071ba4cc 26-Oct-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA: Add rdma_connect_locked()

There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().

In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.

Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.

Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 1c15b4f2 23-Sep-2020 Avihai Horon <avihaih@nvidia.com>

RDMA/core: Modify enum ib_gid_type and enum rdma_network_type

Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# eff74233 25-Sep-2020 Taehee Yoo <ap420073@gmail.com>

net: core: introduce struct netdev_nested_priv for nested interface infrastructure

Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b09c4d70 21-Sep-2020 Leon Romanovsky <leon@kernel.org>

RDMA/restrack: Improve readability in task name management

Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of
tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().

This uniformly makes all restracks add'd using rdma_restrack_add().

Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# c34a23c2 21-Sep-2020 Leon Romanovsky <leon@kernel.org>

RDMA/restrack: Simplify restrack tracking in kernel flows

Have a single rdma_restrack_add() that adds an entry, there is no reason
to split the user/kernel here, the rdma_restrack_set_task() is responsible
for this difference.

This patch prepares the code to the future requirement of making restrack
is mandatory for managing ib objects.

Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 13ef5539 21-Sep-2020 Leon Romanovsky <leon@kernel.org>

RDMA/restrack: Count references to the verbs objects

Refactor the restrack code to make sure the kref inside the restrack entry
properly kref's the object in which it is embedded. This slight change is
needed for future conversions of MR and QP which are refcounted before the
release and kfree.

The ideal flow from ib_core perspective as follows:
* Allocate ib_* structure with rdma_zalloc_*.
* Set everything that is known to ib_core to that newly created object.
* Initialize kref with restrack help
* Call to driver specific allocation functions.
* Insert into restrack DB
....
* Return and release restrack with restrack_put.

Largely this means a rdma_restrack_new() should be called near allocating
the containing structure.

Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 60aaeffa 21-Sep-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Delete from restrack DB after successful destroy

Update the code to have similar destroy pattern like other IB objects.

This change create asymmetry to the rdma_id_private create flow to make
sure that memory is managed by restrack.

Link: https://lore.kernel.org/r/20200922091106.2152715-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b5de0c60 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Fix use after free race in roce multicast join

The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
requires generating the event synchronously inside the
rdma_join_multicast(), which in turn means the dummy struct
ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
the kref inside struct cma_multicast. Since the cma_multicast never
leaks into an unprotected work queue the kfree can be done at the same
time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
few places to recompute the mgid.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 3788d299 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Consolidate the destruction of a cma_multicast in one place

Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 1bb5091d 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Remove dead code for kernel rdmacm multicast

There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 7e85bcda 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Combine cma_ndev_work with cma_work

These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Link: https://lore.kernel.org/r/20200902081122.745412-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 5cfbf929 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Remove cma_comp()

The only place that still uses it is rdma_join_multicast() which is only
doing a sanity check that the caller hasn't done something wrong and
doesn't need the spinlock.

At least in the case of rdma_join_multicast() the information it needs
will remain until the ID is destroyed once it enters these
states. Similarly there is no reason to check for these specific states in
the handler callback, instead use the usual check for a destroyed id under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# d490ee52 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Fix locking for the RDMA_CM_LISTEN state

There is a strange unlocked read of the ID state when checking for
reuseaddr. This is because an ID cannot be reusable once it becomes a
listening ID. Instead of using the state to exclude reuse, just clear it
as part of rdma_listen()'s flow to convert reusable into not reusable.

Once a ID goes to listen there is no way back out, and the only use of
reusable is on the bind_list check.

Finally, update the checks under handler_mutex to use READ_ONCE and audit
that once RDMA_CM_LISTEN is observed in a req callback it is stable under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 732d41c5 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Make the locking for automatic state transition more clear

Re-organize things so the state variable is not read unlocked. The first
attempt to go directly from ADDR_BOUND immediately tells us if the ID is
already bound, if we can't do that then the attempt inside
rdma_bind_addr() to go from IDLE to ADDR_BOUND confirms the ID needs
binding.

Link: https://lore.kernel.org/r/20200902081122.745412-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 2a7cec53 02-Sep-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Fix locking for the RDMA_CM_CONNECT state

It is currently a bit confusing, but the design is if the handler_mutex
is held, and the state is in RDMA_CM_CONNECT, then the state cannot leave
RDMA_CM_CONNECT without also serializing with the handler_mutex.

Make this clearer by adding a direct assertion, fixing the usage in
rdma_connect and generally using READ_ONCE to read the state value.

Link: https://lore.kernel.org/r/20200902081122.745412-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# d114c6fe 18-Aug-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Add missing locking to rdma_accept()

In almost all cases rdma_accept() is called under the handler_mutex by
ULPs from their handler callbacks. The one exception was ucma which did
not get the handler_mutex.

To improve the understand-ability of the locking scheme obtain the mutex
for ucma as well.

This improves how ucma works by allowing it to directly use handler_mutex
for some of its internal locking against the handler callbacks intead of
the global file->mut lock.

There does not seem to be a serious bug here, other than a DISCONNECT event
can be delivered concurrently with accept succeeding.

Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# f6a9d47a 23-Jul-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Execute rdma_cm destruction from a handler properly

When a rdma_cm_id needs to be destroyed after a handler callback fails,
part of the destruction pattern is open coded into each call site.

Unfortunately the blind assignment to state discards important information
needed to do cma_cancel_operation(). This results in active operations
being left running after rdma_destroy_id() completes, and the
use-after-free bugs from KASAN.

Consolidate this entire pattern into destroy_id_handler_unlock() and
manage the locking correctly. The state should be set to
RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher
handlers are called.

Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org
Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com
Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# cc9c0373 23-Jul-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Remove unneeded locking for req paths

The REQ flows are concerned that once the handler is called on the new
cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at
any time.

However, this is not true, while the ULP can call rdma_destroy_id(), it
immediately blocks on the handler_mutex which prevents anything harmful
from running concurrently.

Remove the confusing extra locking and refcounts and make the
handler_mutex protecting state during destroy more clear.

Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 3647a28d 23-Jul-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Using the standard locking pattern when delivering the removal event

Whenever an event is delivered to the handler it should be done under the
handler_mutex and upon any non-zero return from the handler it should
trigger destruction of the cm_id.

cma_process_remove() skips some steps here, it is not necessarily wrong
since the state change should prevent any races, but it is confusing and
unnecessary.

Follow the standard pattern here, with the slight twist that the
transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation().

Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# d54f23c0 23-Jul-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Simplify DEVICE_REMOVAL for internal_id

cma_process_remove() triggers an unconditional rdma_destroy_id() for
internal_id's and skips the event deliver and transition through
RDMA_CM_DEVICE_REMOVAL.

This is confusing and unnecessary. internal_id always has
cma_listen_handler() as the handler, have it catch the
RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal.

This way the FSM sequence never skips the DEVICE_REMOVAL case and the
logic in this hard to test area is simplified.

Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 730c8912 16-Jun-2020 Mark Zhang <markz@mellanox.com>

RDMA/cma: Protect bind_list and listen_list while finding matching cm id

The bind_list and listen_list must be accessed under a lock, add the
missing locking around the access in cm_ib_id_from_event()

In addition add lockdep asserts to make it clearer what the locking
semantic is here.

general protection fault: 0000 [#1] SMP NOPTI
CPU: 226 PID: 126135 Comm: kworker/226:1 Tainted: G OE 4.12.14-150.47-default #1 SLE15
Hardware name: Cray Inc. Windom/Windom, BIOS 0.8.7 01-10-2020
Workqueue: ib_cm cm_work_handler [ib_cm]
task: ffff9c5a60a1d2c0 task.stack: ffffc1d91f554000
RIP: 0010:cma_ib_req_handler+0x3f1/0x11b0 [rdma_cm]
RSP: 0018:ffffc1d91f557b40 EFLAGS: 00010286
RAX: deacffffffffff30 RBX: 0000000000000001 RCX: ffff9c2af5bb6000
RDX: 00000000000000a9 RSI: ffff9c5aa4ed2f10 RDI: ffffc1d91f557b08
RBP: ffffc1d91f557d90 R08: ffff9c340cc80000 R09: ffff9c2c0f901900
R10: 0000000000000000 R11: 0000000000000001 R12: deacffffffffff30
R13: ffff9c5a48aeec00 R14: ffffc1d91f557c30 R15: ffff9c5c2eea3688
FS: 0000000000000000(0000) GS:ffff9c5c2fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002b5cc03fa320 CR3: 0000003f8500a000 CR4: 00000000003406e0
Call Trace:
? rdma_addr_cancel+0xa0/0xa0 [ib_core]
? cm_process_work+0x28/0x140 [ib_cm]
cm_process_work+0x28/0x140 [ib_cm]
? cm_get_bth_pkey.isra.44+0x34/0xa0 [ib_cm]
cm_work_handler+0xa06/0x1a6f [ib_cm]
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to+0x7c/0x4b0
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
process_one_work+0x1da/0x400
worker_thread+0x2b/0x3f0
? process_one_work+0x400/0x400
kthread+0x118/0x140
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x22/0x40
Code: 00 66 83 f8 02 0f 84 ca 05 00 00 49 8b 84 24 d0 01 00 00 48 85 c0 0f 84 68 07 00 00 48 2d d0 01
00 00 49 89 c4 0f 84 59 07 00 00 <41> 0f b7 44 24 20 49 8b 77 50 66 83 f8 0a 75 9e 49 8b 7c 24 28

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Link: https://lore.kernel.org/r/20200616104304.2426081-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 278f74b3 30-May-2020 Chuck Lever <chuck.lever@oracle.com>

RDMA/core: Move and rename trace_cm_id_create()

The restrack ID for an rdma_cm_id is not assigned until it is
associated with a device.

Here's an example I captured while testing NFS/RDMA's support for
DEVICE_REMOVAL. The new tracepoint name is "cm_id_attach".

<...>-4261 [001] 366.581299: cm_event_handler: cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0 ADDR_ERROR (1/-19)
<...>-4261 [001] 366.581304: cm_event_done: cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0 ADDR_ERROR consumer returns 0
<...>-1950 [000] 366.581309: cm_id_destroy: cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0
<...>-7 [001] 369.589400: cm_event_handler: cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0 ADDR_ERROR (1/-19)
<...>-7 [001] 369.589404: cm_event_done: cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0 ADDR_ERROR consumer returns 0
<...>-1950 [000] 369.589407: cm_id_destroy: cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0
<...>-4261 [001] 372.597650: cm_id_attach: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 device=mlx4_0
<...>-4261 [001] 372.597652: cm_event_handler: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED (0/0)
<...>-4261 [001] 372.597654: cm_event_done: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED consumer returns 0
<...>-4261 [001] 372.597738: cm_event_handler: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED (2/0)
<...>-4261 [001] 372.597740: cm_event_done: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED consumer returns 0
<...>-4691 [007] 372.600101: cm_qp_create: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 pd.id=2 qp_type=RC send_wr=4091 recv_wr=256 qp_num=530 rc=0
<...>-4691 [007] 372.600207: cm_send_req: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 qp_num=530
<...>-185 [002] 372.601212: cm_send_mra: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0
<...>-185 [002] 372.601362: cm_send_rtu: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0
<...>-185 [002] 372.601372: cm_event_handler: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ESTABLISHED (9/0)
<...>-185 [002] 372.601379: cm_event_done: cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ESTABLISHED consumer returns 0

Fixes: ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection Manager")
Link: https://lore.kernel.org/r/20200530174934.21362.56754.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 8094ba0a 26-May-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Provide ECE reject reason

IBTA declares "vendor option not supported" reject reason in REJ messages
if passive side doesn't want to accept proposed ECE options.

Due to the fact that ECE is managed by userspace, there is a need to let
users to provide such rejected reason.

Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 0cb15372 26-May-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Connect ECE to rdma_accept

The rdma_accept() is called by both passive and active sides of CMID
connection to mark readiness to start data transfer. For passive side,
this is called explicitly, for active side, it is called implicitly while
receiving REP message.

Provide ECE data to rdma_accept function needed for passive side to send
that REP message.

Link: https://lore.kernel.org/r/20200526103304.196371-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# a20652e1 26-May-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cm: Send and receive ECE parameter over the wire

ECE parameters are exchanged through REQ->REP/SIDR_REP messages, this
patch adds the data to provide to other side of CMID communication
channel.

Link: https://lore.kernel.org/r/20200526103304.196371-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 34e2ab57 26-May-2020 Leon Romanovsky <leon@kernel.org>

RDMA/ucma: Extend ucma_connect to receive ECE parameters

Active side of CMID initiates connection through librdmacm's
rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to
handle those new parameters.

Link: https://lore.kernel.org/r/20200526103304.196371-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# f6653405 03-May-2020 Mark Zhang <markz@mellanox.com>

RDMA/cma: Initialize the flow label of CM's route path record

If flow label is not set by the user or it's not IPv4, initialize it with
the cma src/dst based on the "Kernighan and Ritchie's hash function".

Link: https://lore.kernel.org/r/20200504051935.269708-5-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 11a0ae4c 21-Apr-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA: Allow ib_client's to fail when add() is called

When a client is added it isn't allowed to fail, but all the client's have
various failure paths within their add routines.

This creates the very fringe condition where the client was added, failed
during add and didn't set the client_data. The core code will then still
call other client_data centric ops like remove(), rename(), get_nl_info(),
and get_net_dev_by_params() with NULL client_data - which is confusing and
unexpected.

If the add() callback fails, then do not call any more client ops for the
device, even remove.

Remove all the now redundant checks for NULL client_data in ops callbacks.

Update all the add() callbacks to return error codes
appropriately. EOPNOTSUPP is used for cases where the ULP does not support
the ib_device - eg because it only works with IB.

Link: https://lore.kernel.org/r/20200421172440.387069-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# dd302ee4 13-Apr-2020 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Limit the scope of rdma_is_consumer_reject function

The function is local to cma.c, so let's limit its scope.

Link: https://lore.kernel.org/r/20200413132323.930869-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 987914ab 17-Mar-2020 Avihai Horon <avihaih@mellanox.com>

RDMA/cm: Update num_paths in cma_resolve_iboe_route error flow

After a successful allocation of path_rec, num_paths is set to 1, but any
error after such allocation will leave num_paths uncleared.

This causes to de-referencing a NULL pointer later on. Hence, num_paths
needs to be set back to 0 if such an error occurs.

The following crash from syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 0 PID: 357 Comm: syz-executor060 Not tainted 4.18.0+ #311
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:ib_copy_path_rec_to_user+0x94/0x3e0
Code: f1 f1 f1 f1 c7 40 0c 00 00 f4 f4 65 48 8b 04 25 28 00 00 00 48 89
45 c8 31 c0 e8 d7 60 24 ff 48 8d 7b 4c 48 89 f8 48 c1 e8 03 <42> 0f b6
14 30 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RSP: 0018:ffff88006586f980 EFLAGS: 00010207
RAX: 0000000000000009 RBX: 0000000000000000 RCX: 1ffff1000d5fe475
RDX: ffff8800621e17c0 RSI: ffffffff820d45f9 RDI: 000000000000004c
RBP: ffff88006586fa50 R08: ffffed000cb0df73 R09: ffffed000cb0df72
R10: ffff88006586fa70 R11: ffffed000cb0df73 R12: 1ffff1000cb0df30
R13: ffff88006586fae8 R14: dffffc0000000000 R15: ffff88006aff2200
FS: 00000000016fc880(0000) GS:ffff88006d000000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000040 CR3: 0000000063fec000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? ib_copy_path_rec_from_user+0xcc0/0xcc0
? __mutex_unlock_slowpath+0xfc/0x670
? wait_for_completion+0x3b0/0x3b0
? ucma_query_route+0x818/0xc60
ucma_query_route+0x818/0xc60
? ucma_listen+0x1b0/0x1b0
? sched_clock_cpu+0x18/0x1d0
? sched_clock_cpu+0x18/0x1d0
? ucma_listen+0x1b0/0x1b0
? ucma_write+0x292/0x460
ucma_write+0x292/0x460
? ucma_close_id+0x60/0x60
? sched_clock_cpu+0x18/0x1d0
? sched_clock_cpu+0x18/0x1d0
__vfs_write+0xf7/0x620
? ucma_close_id+0x60/0x60
? kernel_read+0x110/0x110
? time_hardirqs_on+0x19/0x580
? lock_acquire+0x18b/0x3a0
? finish_task_switch+0xf3/0x5d0
? _raw_spin_unlock_irq+0x29/0x40
? _raw_spin_unlock_irq+0x29/0x40
? finish_task_switch+0x1be/0x5d0
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? security_file_permission+0x172/0x1e0
vfs_write+0x192/0x460
ksys_write+0xc6/0x1a0
? __ia32_sys_read+0xb0/0xb0
? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
? do_syscall_64+0x1d/0x470
do_syscall_64+0x9e/0x470
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200318101741.47211-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 32ac9e43 27-Feb-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Teach lockdep about the order of rtnl and lock

This lock ordering only happens when bonding is enabled and a certain
bonding related event fires. However, since it can happen this is a global
restriction on lock ordering.

Teach lockdep about the order directly and unconditionally so bugs here
are found quickly.

See https://syzkaller.appspot.com/bug?extid=55de90ab5f44172b0c90

Link: https://lore.kernel.org/r/20200227203651.GA27185@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e4103312 12-Feb-2020 Parav Pandit <parav@mellanox.com>

Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"

This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.

The call chain below requires the cm_id_priv's destination address to be
setup before performing rdma_bind_addr(). Otherwise source port allocation
fails as cma_port_is_unique() no longer sees the correct tuple to allow
duplicate users of the source port.

rdma_resolve_addr()
cma_bind_addr()
rdma_bind_addr()
cma_get_port()
cma_alloc_any_port()
cma_port_is_unique() <- compared with zero daddr

This can result in false failures to connect, particularly if the source
port range is restricted.

Fixes: 219d2e9dfda9 ("RDMA/cma: Simplify rdma_resolve_addr() error flow")
Link: https://lore.kernel.org/r/20200212072635.682689-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 43fb5892 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use refcount API to reflect refcount

Use a refcount_t for atomics being used as a refcount.

Link: https://lore.kernel.org/r/20200126142652.104803-8-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e368d23f 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Rename cma_device ref/deref helpers to to get/put

Helper functions which increment/decrement reference count of a
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_id() to cma_id_get/put(). Also use
cma_get_id() wrapper to find the balancing put() calls.

Link: https://lore.kernel.org/r/20200126142652.104803-7-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# be439912 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use refcount API to reflect refcount

Use the refcount variant to capture the reference counting of the cma
device structure.

Link: https://lore.kernel.org/r/20200126142652.104803-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5ff8c8fa 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Rename cma_device ref/deref helpers to to get/put

Helper functions which increment/decrement reference count of the
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_dev() to cma_dev_get/put().

Link: https://lore.kernel.org/r/20200126142652.104803-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# cc055dd3 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use RDMA device port iterator

Use RDMA device port iterator to avoid open coding.

Link: https://lore.kernel.org/r/20200126142652.104803-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 081ea519 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use a helper function to enqueue resolve work items

To avoid errors, with attaching ownership of work item and its cm_id
refcount which is decremented in work handler, tie them up in single
helper function. Also avoid code duplication.

Link: https://lore.kernel.org/r/20200126142652.104803-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# b4fb4cc5 26-Jan-2020 Parav Pandit <parav@mellanox.com>

RDMA/cma: Fix unbalanced cm_id reference count during address resolve

Below commit missed the AF_IB and loopback code flow in
rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in
cma_work_handler() which puts the refcount which was not incremented prior
to queuing the work.

A call trace is observed with such code flow:

BUG: unable to handle kernel NULL pointer dereference at (null)
[<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0
[<ffffffff96b6715f>] mutex_lock+0x1f/0x2f
[<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0
[<ffffffff964b9ebf>] process_one_work+0x17f/0x440
[<ffffffff964baf56>] worker_thread+0x126/0x3c0

Hence, hold the cm_id reference when scheduling the resolve work item.

Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ed999f82 18-Dec-2019 Chuck Lever <chuck.lever@oracle.com>

RDMA/cma: Add trace points in RDMA Connection Manager

Record state transitions as each connection is established. The IP address
of both peers and the Type of Service is reported. These trace points are
not in performance hot paths.

Also, record each cm_event_handler call to ULPs. This eliminates the need
for each ULP to add its own similar trace point in its CM event handler
function.

These new trace points appear in a new trace subsystem called "rdma_cma".

Sample events:

<...>-220 [004] 121.430733: cm_id_create: cm.id=0
<...>-472 [003] 121.430991: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED (0/0)
<...>-472 [003] 121.430995: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
<...>-472 [003] 121.431172: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED (2/0)
<...>-472 [003] 121.431174: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
<...>-220 [004] 121.433480: cm_qp_create: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 pd.id=2 qp_type=RC send_wr=4091 recv_wr=256 qp_num=521 rc=0
<...>-220 [004] 121.433577: cm_send_req: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521
kworker/1:2-973 [001] 121.436190: cm_send_mra: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
kworker/1:2-973 [001] 121.436340: cm_send_rtu: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
kworker/1:2-973 [001] 121.436359: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ESTABLISHED (9/0)
kworker/1:2-973 [001] 121.436365: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
<...>-1975 [005] 123.161954: cm_disconnect: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
<...>-1975 [005] 123.161974: cm_sent_dreq: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
<...>-220 [004] 123.162102: cm_disconnect: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
kworker/0:1-13 [000] 123.162391: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 DISCONNECTED (10/0)
kworker/0:1-13 [000] 123.162393: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
<...>-220 [004] 123.164456: cm_qp_destroy: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521
<...>-220 [004] 123.165290: cm_id_destroy: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0

Some features to note:
- restracker ID of the rdma_cm_id is tagged on each trace event
- The source and destination IP addresses and TOS are reported
- CM event upcalls are shown with decoded event and status
- CM state transitions are reported
- rdma_cm_id lifetime events are captured
- The latency of ULP CM event handlers is reported
- Lifetime events of associated QPs are reported
- Device removal and insertion is reported

This patch is based on previous work by:

Saeed Mahameed <saeedm@mellanox.com>
Mukesh Kacker <mukesh.kacker@oracle.com>
Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Aron Silverton <aron.silverton@oracle.com>
Avinash Repaka <avinash.repaka@oracle.com>
Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Link: https://lore.kernel.org/r/20191218201810.30584.3052.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 44a7b675 05-Dec-2019 Chuhong Yuan <hslester96@gmail.com>

RDMA/cma: add missed unregister_pernet_subsys in init failure

The driver forgets to call unregister_pernet_subsys() in the error path
of cma_init().
Add the missed call to fix it.

Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Link: https://lore.kernel.org/r/20191206012426.12744-1-hslester96@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# e1ee1e62 30-Oct-2019 Dag Moxnes <dag.moxnes@oracle.com>

RDMA/cma: Use ACK timeout for RoCE packetLifeTime

The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME,
for the PacketLifeTime, as it can not be determined from the network.
This value might not be optimal for all networks.

The cma module supports the function rdma_set_ack_timeout to set the ACK
timeout for a QP associated with a connection. As per IBTA 12.7.34 local
ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay). Assuming a
negligible local ACK delay, we can use PacketLifeTime = local ACK
timeout/2 as a reasonable approximation for RoCE networks.

Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 24f52149 20-Oct-2019 Leon Romanovsky <leon@kernel.org>

RDMA/cm: Update copyright together with SPDX tag

Add Mellanox to lust of copyright holders and replace copyright
boilerplate with relevant SPDX tag.

Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d3bd9396 15-Oct-2019 Parav Pandit <parav@mellanox.com>

IB/cma: Honor traffic class from lower netdevice for RoCE

When a macvlan netdevice is used for RoCE, consider the tos->prio->tc
mapping as SL using its lower netdevice.

1. If the lower netdevice is a VLAN netdevice, consider the VLAN netdevice
and it's parent netdevice for mapping
2. If the lower netdevice is not a VLAN netdevice, consider tc mapping
directly from the lower netdevice

Link: https://lore.kernel.org/r/20191015072058.17347-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# b66f31ef 30-Sep-2019 Bart Van Assche <bvanassche@acm.org>

RDMA/iwcm: Fix a lock inversion issue

This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
#0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
#1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
#2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
dump_stack+0x8a/0xd6
__lock_acquire.cold+0xe1/0x24d
lock_acquire+0x106/0x240
__mutex_lock+0x12e/0xcb0
mutex_lock_nested+0x1f/0x30
rdma_destroy_id+0x78/0x4a0 [rdma_cm]
iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
cm_work_handler+0xe62/0x1100 [iw_cm]
process_one_work+0x56d/0xac0
worker_thread+0x7a/0x5d0
kthread+0x1bc/0x210
ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# a6e4d254 02-Sep-2019 Håkon Bugge <haakon.bugge@oracle.com>

RDMA/cma: Fix false error message

In addr_handler(), assuming status == 0 and the device already has been
acquired (id_priv->cma_dev != NULL), we get the following incorrect
"error" message:

RDMA CM: ADDR_ERROR: failed to resolve IP. status 0

Fixes: 498683c6a7ee ("IB/cma: Add debug messages to error flows")
Link: https://lore.kernel.org/r/20190902092731.1055757-1-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# a7bfb93f 18-Aug-2019 zhengbin <zhengbin13@huawei.com>

RDMA/cma: fix null-ptr-deref Read in cma_cleanup

In cma_init, if cma_configfs_init fails, need to free the
previously memory and return fail, otherwise will trigger
null-ptr-deref Read in cma_cleanup.

cma_cleanup
cma_configfs_exit
configfs_unregister_subsystem

Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# adb4a57a 02-May-2019 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev

To access the netdevice of the GID attribute, use an existing API
rdma_read_gid_attr_ndev_rcu().

This further reduces dependency on open access to netdevice of GID
attribute.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5d7ed2f2 10-Apr-2019 Parav Pandit <parav@mellanox.com>

RDMA/cma: Consider scope_id while binding to ipv6 ll address

When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.

listener-1: addr=lla, scope_id=A, port=X
listener-2: addr=lla, scope_id=B, port=X

However while comparing the addresses only addr and port are considered,
due to which 2nd listener fails to listen.

In below example of two listeners, 2nd listener is failing with address in
use error.

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545&

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545
rdma_bind_addr: Address already in use

To overcome this, consider the scope_ids as well which forms the accurate
IPv6 link local address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 061ccb52 02-Apr-2019 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Set proper port number as index

Conversion from IDR to XArray missed the fact that idr_alloc() returned
index as a return value, this index was saved in port variable and used as
query index later on. This caused to the following error.

BUG: KASAN: use-after-free in cma_check_port+0x86a/0xa20 [rdma_cm]
Read of size 8 at addr ffff888069fde998 by task ucmatose/387
CPU: 3 PID: 387 Comm: ucmatose Not tainted 5.1.0-rc2+ #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x7c/0xc0
print_address_description+0x6c/0x23c
? cma_check_port+0x86a/0xa20 [rdma_cm]
kasan_report.cold.3+0x1c/0x35
? cma_check_port+0x86a/0xa20 [rdma_cm]
? cma_check_port+0x86a/0xa20 [rdma_cm]
cma_check_port+0x86a/0xa20 [rdma_cm]
rdma_bind_addr+0x11bc/0x1b00 [rdma_cm]
? find_held_lock+0x33/0x1c0
? cma_ndev_work_handler+0x180/0x180 [rdma_cm]
? wait_for_completion+0x3d0/0x3d0
ucma_bind+0x120/0x160 [rdma_ucm]
? ucma_resolve_addr+0x1a0/0x1a0 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
? ucma_open+0x260/0x260 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
? __ia32_sys_read+0xb0/0xb0
? trace_hardirqs_off_caller+0x5b/0x160
? do_syscall_64+0x18/0x3c0
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Allocated by task 381:
__kasan_kmalloc.constprop.5+0xc1/0xd0
cma_alloc_port+0x4d/0x160 [rdma_cm]
rdma_bind_addr+0x14e7/0x1b00 [rdma_cm]
ucma_bind+0x120/0x160 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 381:
__kasan_slab_free+0x12e/0x180
kfree+0xed/0x290
rdma_destroy_id+0x6b6/0x9e0 [rdma_cm]
ucma_close+0x110/0x300 [rdma_ucm]
__fput+0x25a/0x740
task_work_run+0x10e/0x190
do_exit+0x85e/0x29e0
do_group_exit+0xf0/0x2e0
get_signal+0x2e0/0x17e0
do_signal+0x94/0x1570
exit_to_usermode_loop+0xfa/0x130
do_syscall_64+0x327/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: <syzbot+2e3e485d5697ea610460@syzkaller.appspotmail.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Fixes: 638267537ad9 ("cma: Convert portspace IDRs to XArray")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 41c61401 26-Feb-2019 Parav Pandit <parav@mellanox.com>

RDMA: Check net namespace access for uverbs, umad, cma and nldev

Introduce an API rdma_dev_access_netns() to check whether a rdma device
can be accessed from the specified net namespace or not.
Use rdma_dev_access_netns() while opening character uverbs, umad network
device and also check while rdma cm_id binds to rdma device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 63826753 20-Feb-2019 Matthew Wilcox <willy@infradead.org>

cma: Convert portspace IDRs to XArray

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ea1075ed 12-Feb-2019 Jason Gunthorpe <jgg@ziepe.ca>

RDMA: Add and use rdma_for_each_port

We have many loops iterating over all of the end port numbers on a struct
ib_device, simplify them with a for_each helper.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 926ba19b 01-Feb-2019 Steve Wise <larrystevenwise@gmail.com>

RDMA/iwcm: add tos_set bool to iw_cm struct

This allows drivers to know the tos was actively set by the application.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 9491128f 01-Feb-2019 Steve Wise <larrystevenwise@gmail.com>

RDMA/cma: listening device cm_ids should inherit tos

If a user binds to INADDR_ANY and sets the service id, then the
device-specific cm_ids should also use this tos. This allows an app to
do:

rdma_bind_addr(INADDR_ANY)
set_service_type()
rdma_listen()

And connections setup via this listening endpoint will use the correct
tos.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2c1619ed 24-Jan-2019 Danit Goldberg <danitg@mellanox.com>

IB/cma: Define option to set ack timeout and pack tos_set

Define new option in 'rdma_set_option' to override calculated QP timeout
when requested to provide QP attributes to modify a QP.

At the same time, pack tos_set to be bitfield.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# a78e8723 16-Jan-2019 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Remove CM_ID statistics provided by rdma-cm module

Netlink statistics exported by rdma-cm never had any working user space
component published to the mailing list or to any open source
project. Canvassing various proprietary users, and the original requester,
we find that there are no real users of this interface.

This patch simply removes all occurrences of RDMA CM netlink in favour of
modern nldev implementation, which provides the same information and
accompanied by widely used user space component.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5fc01fb8 09-Jan-2019 Myungho Jung <mhjungk@gmail.com>

RDMA/cma: Rollback source IP address if failing to acquire device

If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the
device state changes back to RDMA_CM_ADDR_BOUND but the resolved source
IP address is still left. After that, if rdma_destroy_id() is called
after rdma_listen(), the device is freed without removed from
listen_any_list in cma_cancel_operation(). Revert to the previous IP
address if acquiring device fails.

Reported-by: syzbot+f3ce716af730c8f96637@syzkaller.appspotmail.com
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 917cb8a7 07-Jan-2019 Steve Wise <larrystevenwise@gmail.com>

RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type

A recent regression causes a null ptr crash when dumping cm_id resources.
The cma is incorrectly adding all cm_id restrack resources as kernel mode.

Fixes: af8d70375d56 ("RDMA/restrack: Resource-tracker should not use uobject pointers")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# af8d7037 17-Dec-2018 Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

RDMA/restrack: Resource-tracker should not use uobject pointers

Having uobject pointer embedded in ib core objects is not aligned with a
future shared ib_x model. The resource tracker only does this to keep
track of user/kernel objects - track this directly instead.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# dbace111 11-Oct-2018 Leon Romanovsky <leon@kernel.org>

RDMA/core: Annotate timeout as unsigned long

The ucma users supply timeout in u32 format, it means that any number
with most significant bit set will be converted to negative value
by various rdma_*, cma_* and sa_query functions, which treat timeout
as int.

In the lowest level, the timeout is converted back to be unsigned long.
Remove this ambiguous conversion by updating all function signatures to
receive unsigned long.

Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d6f91252 11-Oct-2018 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Remove unused timeout_ms parameter from cma_resolve_iw_route()

cma_resolve_iw_route() doesn't use timeout_ms parameter, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ed7a01fd 02-Oct-2018 Leon Romanovsky <leon@kernel.org>

RDMA/restrack: Release task struct which was hold by CM_ID object

Tracking CM_ID resource is performed in two stages: creation of cm_id
and connecting it to the cma_dev. It is needed because rdma-cm protocol
exports two separate user-visible calls rdma_create_id and rdma_accept.

At the time of CM_ID creation, the real owner of that object is unknown
yet and we need to grab task_struct. This task_struct is released or
reassigned in attach phase later on. but call to rdma_destroy_id left
this task_struct unreleased.

Such separation is unique to CM_ID and other restrack objects initialize
in one shot. It means that it is safe to use "res->valid" check to catch
unfinished CM_ID flow and release task_struct for that object.

Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information")
Reported-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2165fc26 02-Oct-2018 Leon Romanovsky <leon@kernel.org>

RDMA/restrack: Consolidate task name updates in one place

Unify task update and kernel name set in one place.

Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 41ab1cb7 14-Sep-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Introduce and use cma_ib_acquire_dev()

When RDMA CM connect request arrives for IB transport, it already contains
device, port, netdevice (optional).

Instead of traversing all the cma devices, use the cma device already
found by the cma_find_listener() for which a listener id is provided.

iWarp devices doesn't need to derive RoCE GIDs, therefore drop RoCE
specific checks from cma_acquire_dev() and rename it to
cma_iw_acquire_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ff11c6cd 14-Sep-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()

Light weight version of cma_acquire_dev() just for binding with rdma
device based on source IP(v4/v6) address.

This simplifies cma_acquire_dev() to avoid listen_id specific checks and
also for subsequent simplification for IB vs iWarp.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 78fb282b 14-Sep-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Allow accepting requests for multi port rdma device

When IP failover is used between multiple ports of a given rdma device,
allow accepting CM requests from either of the ports. This is applicable
for IPv4 and IPv6 non link local addressing scheme.

IPv6 link local addresses are bound. IP failover requests for listen
cm_ids bound to specific netdev interfaces cannot be supported.
(Similar to traditional sockets).

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 43c7c851 20-Sep-2018 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->name

Any messages related to a device should be printed with the dev_*
formatters. This provides greater consistency for the user.

The core does not set pr_fmt so this has no significant change.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>


# 0e9d2c19 04-Sep-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Consider net ns of gid attribute for RoCE

When resolving destination address or route, when net namespace is
unavailable, refer to the net namespace of the netdevice of the SGID
attribute. This is typically the case for requests arriving from the
network for RoCE ports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 77addc52 04-Sep-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Rename rdma_copy_addr to rdma_copy_src_l2_addr

Now that rdma_copy_addr() only copies the source addresses and all callers
are interested in copying only source addresses, simplify it to drop the
destination address argument.

Given that it only copies source layer2 addresses, rename it to
rdma_copy_src_l2_addr for better code readability.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 722c7b2b 28-Aug-2018 Parav Pandit <parav@mellanox.com>

RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()

Currently rdma_addr_cancel() is an async operation, which notifies that
cancel is done by executing the callback function given during
rdma_resolve_ip(). If resolve_ip request is already completed than
callback is not executed.

Instead, now rdma_resolve_addr() and rdma_addr_cancel() simplified in
following ways.
1. rdma_addr_cancel() now a synchronous method. If request was
pending, after it is cancelled, no callback is notified.
2. rdma_resolve_addr() and respective addr_handler() callback doesn't
need to hold reference to cm_id.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 954a8e3a 29-Aug-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Protect cma dev list with lock

When AF_IB addresses are used during rdma_resolve_addr() a lock is not
held. A cma device can get removed while list traversal is in progress
which may lead to crash. ie

CPU0 CPU1
==== ====
rdma_resolve_addr()
cma_resolve_ib_dev()
list_for_each() cma_remove_one()
cur_dev->device mutex_lock(&lock)
list_del();
mutex_unlock(&lock);
cma_process_remove();


Therefore, hold a lock while traversing the list which avoids such
situation.

Cc: <stable@vger.kernel.org> # 3.10
Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 85463316 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Prefix _ib to IB/RoCE specific functions

In rdma cm module, functions which are common between IB and iWarp
are named with cma_.
iWarp specific functions are prefixed with cma_iw.
IB specific functions are perfixed with cma_ib.

However some functions in request processing path didn't follow
cma_ib notion. Prefix them with _ib for better code clarity.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 79d684f0 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Simplify gid type check in cma_acquire_dev()

cma_add_one() initializes the default GID regardless of device type.
listen_id is bound to a device and an IP address, its GID type is
initialized by cma_acquire_dev().

Therefore a valid default GID type is always available, it is not needed
to check port type during cma_acquire_dev().

Initialize gid type of a cm id when the cm_id is created instead of
doing conditional checks during cma_acquire_dev() and trying to
initialize to 0 during _cma_attach_to_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 7582df82 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Avoid holding lock while initializing fields on stack

In various functions rdma_cm_event is zero initialized on stack using
memset() while holding lock which is not necessary.
Therefore, don't hold the lock while initializing on stack.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ca3a8ace 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Return bool instead of int

Return bool for following internal and inline functions as their
underlying APIs return bool too.

1. cma_zero_addr()
2. cma_loopback_addr()
3. cma_any_addr()
4. ib_addr_any()
5. ib_addr_loopback()

While we are touching cma_loopback_addr(), remove extra white spaces
in it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 05e0b86c 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Get rid of 1 bit boolean

Arrange fields of cma_req_info structure for efficiency on
stack and get rid of one bit boolean field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e7ff98ae 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Constify path record, ib_cm_event, listen_id pointers

Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2df7dba8 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/core: Constify dst_addr argument

Following APIs are not supposed to modify addr or dest_addr contents.
Therefore make those function argument const for better code
readability.

1. rdma_resolve_ip()
2. rdma_addr_size()
3. rdma_resolve_addr()

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 219d2e9d 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Simplify rdma_resolve_addr() error flow

Currently dst address is first set and later on cleared on either of the
3 error conditions are met.
However none of the APIs or checks are supposed to refer to the
destination address of the cm_id.
Therefore, set the destination address after necessary checks pass which
simplifies the error flow.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e11fef9f 29-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Initialize resource type in __rdma_create_id()

Currently rdma_cm_id's resource tracking fields such as owner task and
kern_name and other non resource tracking fields are initialized in
in single function __rdma_create_id().

Therefore, initialize rdma_cm_id's resource type also in same init
function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 643d213a 16-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Do not ignore net namespace for unbound cm_id

Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d274e45c 16-Jul-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Consider netdevice for RoCE ports

When netdevice is not found for a request, and if it for RoCE port,
currently it allows matching the listener as long as port number matches
by ignoring the netdevice.

Now that we always prefer to have netdevice associated with RoCE, when
netdevice is not found, don't consider RoCE ports.

In other words, a NULL netdevice with RoCE is not acceptable. Therefore,
remove this confusing RoCE port ignorance check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# cee10433 16-Jul-2018 Parav Pandit <parav@mellanox.com>

IB/core: Introduce and use sgid_attr in CM requests

For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.

Now that we have the GID attribute available, introduce SGID attribute in
incoming CM requests and refer to the netdevice of it. This is similar to
existing SGID attribute field in outgoing CM requests for RC and UD
transports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 076dd53b 25-Jul-2018 Varsha Rao <rvarsha016@gmail.com>

IB/core: Remove extra parentheses

Remove unnecessary parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c0126915 11-Jul-2018 Jason Gunthorpe <jgg@ziepe.ca>

IB/cm: Remove cma_multicast->igmp_joined

This variable isn't read and written to with proper locking, so it is
racy. Instead of using an unlocked bool use presence in the mc->list

The caller could race rdma_join_multicast with rdma_leave_multicast which
would leak a mc join and cause a use after free of mc.

Instead, do not add the mc to the list until it has completed
initialization, all mcs on the list require leaving.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 39839107 19-Jun-2018 Parav Pandit <parav@mellanox.com>

IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'

While processing a path record entry in CM messages the associated GID
attribute is now also supplied.

Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.

Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.

This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 815d456e 19-Jun-2018 Parav Pandit <parav@mellanox.com>

IB/cm: Pass the sgid_attr through various events

Make the sgid_attr available along with path information to the event
consumer, this allows the consumer to keep using the same GID table entry
as the event is related to.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 4ed13a5f 19-Jun-2018 Parav Pandit <parav@mellanox.com>

IB/cm: Keep track of the sgid_attr that created the cm id

Hold reference to the the sgid_attr which is used in a cm_id until the
cm_id is destroyed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# aa74f487 19-Jun-2018 Parav Pandit <parav@mellanox.com>

IB: Make init_ah_attr_grh_fields set sgid_attr

Use the sgid and other information from the path record to figure out the
sgid_attrs.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# f685c195 19-Jun-2018 Parav Pandit <parav@mellanox.com>

IB: Make ib_init_ah_from_mcmember set sgid_attr

This is really just a CM support function, normally a multicast address
does not have a specific SGID - but the RDMA CM usage model does restrict
things to the netdevice the CM id is bound to, at least for roce case.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 88145678 21-Jun-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Consider net namespace while leaving multicast group

When sending multicast leave request, consider the net ns in which this
cm_id is created.

Code was duplicated in cma_leave_mc_groups() and rdma_leave_multicast(),
which is now done using a helper function cma_leave_roce_mc_group().

Fixes: bee3c3c91865 ("IB/cma: Join and leave multicast groups with IGMP")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 1dfce294 04-Jun-2018 Parav Pandit <parav@mellanox.com>

IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gid

If the gid_attr argument is NULL then the functions behave identically to
rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
can be consolidated to one function.

Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
ib_query_gid() API is removed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 6da2ec56 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kmalloc() -> kmalloc_array()

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

kmalloc(a * b, gfp)

with:
kmalloc_array(a * b, gfp)

as well as handling cases of:

kmalloc(a * b * c, gfp)

with:

kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 671a6cc2 29-May-2018 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Ignore unknown event

There is no need to bring down the whole machine, just because unknown
event was received. It is better to ignore it silently.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# fbdb0a91 10-May-2018 Steve Wise <larrystevenwise@gmail.com>

RDMA/CMA: add rdma_iw_cm_id() and rdma_res_to_id() helpers

Add a helper function for iwarp drivers to be able to map an
rdma_cm_id to an iw_cm_id. This is useful for dumping driver specific
NLDEV/RESTRACK connection state.

Add a helper to return the rdma_cm_id pointer from the rdma_restack
pointer. This is needed for rdma drivers to map a res entry back to
the public rdma_cm_id struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 9aa16921 02-May-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Do not query GID during QP state transition to RTR

When commit [1] was added, SGID was queried to derive the SMAC address.
Then, later on during a refactor [2], SMAC was no longer needed. However,
the now useless GID query remained. Then during additional code changes
later on, the GID query was being done in such a way that it caused iWARP
queries to start breaking. Remove the useless GID query and resolve the
iWARP breakage at the same time.

This is discussed in [3].

[1] commit dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
[2] commit 5c266b2304fb ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av")
[3] https://www.spinics.net/lists/linux-rdma/msg63951.html

Suggested-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 2918c1a9 24-Apr-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Fix use after destroy access to net namespace for IPoIB

There are few issues with validation of netdevice and listen id lookup
for IB (IPoIB) while processing incoming CM request as below.

1. While performing lookup of bind_list in cma_ps_find(), net namespace
of the netdevice can get deleted in cma_exit_net(), resulting in use
after free access of idr and/or net namespace structures.
This lookup occurs from the workqueue context (and not userspace
context where net namespace is always valid).

CPU0 CPU1
==== ====

bind_list = cma_ps_find();
move netdevice to new namespace
delete net namespace
cma_exit_net()
idr_destroy(idr);

[..]
cma_find_listener(bind_list, ..);

2. While netdevice is validated for IP address in given net namespace,
netdevice's net namespace and/or ifindex can change in
cma_get_net_dev() and cma_match_net_dev().

Above issues are overcome by using rcu lock along with netdevice
UP/DOWN state as described below.
When a net namespace is getting deleted, netdevice is closed and
shutdown before moving it back to init_net namespace.
change_net_namespace() synchronizes with any existing use of netdevice
before changing the netdev properties such as net or ifindex.
Once netdevice IFF_UP flags is cleared, such fields are not guaranteed
to be valid.
Therefore, rcu lock along with netdevice state check ensures that,
while route lookup and cm_id lookup is in progress, netdevice of
interest won't migrate to any other net namespace.
This ensures that associated net namespace of netdevice won't get
deleted while rcu lock is held for netdevice which is in IFF_UP state.

Fixes: fa20105e09e9 ("IB/cma: Add support for network namespaces")
Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces")
Fixes: f887f2ac87c2 ("IB/cma: Validate routing of incoming requests")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ee6548d1 02-Apr-2018 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/rdma_cm: Delete rdma_addr_client

The only thing it does is block module unload while work is posted from
rdma_resolve_ip().

However, this is not the right place to do this. The users of
rdma_resolve_ip() must ensure their own module does not unload until
rdma_resolve_ip() calls the callback, or until rdma_addr_cancel() is
called.

Similarly callers to rdma_addr_find_l2_eth_by_grh() must ensure their
module does not unload while they are calling code.

The only two users are already safe, so there is no need for this.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2253fc0c 27-Mar-2018 Steve Wise <larrystevenwise@gmail.com>

RDMA/CMA: Add rdma_port_space to UAPI

Since the rdma_port_space enum is being passed between user and kernel for
user cm_id setup, we need it in a UAPI header. So add it to
rdma_user_cm.h.

This also fixes the cm_id restrack changes which pass up the port space
value via the RDMA_NLDEV_ATTR_RES_PS attribute.

Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 114cc9c4 21-Mar-2018 Parav Pandit <parav@mellanox.com>

IB/cma: Resolve route only while receiving CM requests

Currently CM request for RoCE follows following flow.
rdma_create_id()
rdma_resolve_addr()
rdma_resolve_route()
For RC QPs:
rdma_connect()
->cma_connect_ib()
->ib_send_cm_req()
->cm_init_av_by_path()
->ib_init_ah_attr_from_path()
For UD QPs:
rdma_connect()
->cma_resolve_ib_udp()
->ib_send_cm_sidr_req()
->cm_init_av_by_path()
->ib_init_ah_attr_from_path()

In both the flows, route is already resolved before sending CM requests.
Therefore, code is refactored to avoid resolving route second time in
ib_cm layer.
ib_init_ah_attr_from_path() is extended to resolve route when it is not
yet resolved for RoCE link layer. This is achieved by caller setting
route_resolved field in path record whenever it has route already
resolved.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5ac08a34 13-Mar-2018 Parav Pandit <parav@mellanox.com>

IB/cma: Use rdma_protocol_roce() and remove cma_protocol_roce_dev_port()

rdma_protocol_roce() API from the ib_core already provides a way to
detect whether a given device+port is RoCE or not.
Therefore, make use of it and avoid implementing it again in rdmacm
module.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 6d337179 13-Mar-2018 Parav Pandit <parav@mellanox.com>

IB/core: Honor return status of ib_init_ah_from_mcmember()

The return status of ib_init_ah_from_mcmember() is ignored by
cma_ib_mc_handler(). Honor it and return error event if ah attribute
initialization failed.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 7688f2c3 13-Mar-2018 Leon Romanovsky <leon@kernel.org>

RDMA/ucma: Fix access to non-initialized CM_ID object

The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.

[ 64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0
[ 64.076797] Read of size 8 at addr 00000000000000b0 by task join/691
[ 64.076797]
[ 64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23
[ 64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[ 64.076803] Call Trace:
[ 64.076809] dump_stack+0x5c/0x77
[ 64.076817] kasan_report+0x163/0x380
[ 64.085859] ? rdma_join_multicast+0x26e/0x12c0
[ 64.086634] rdma_join_multicast+0x26e/0x12c0
[ 64.087370] ? rdma_disconnect+0xf0/0xf0
[ 64.088579] ? __radix_tree_replace+0xc3/0x110
[ 64.089132] ? node_tag_clear+0x81/0xb0
[ 64.089606] ? idr_alloc_u32+0x12e/0x1a0
[ 64.090517] ? __fprop_inc_percpu_max+0x150/0x150
[ 64.091768] ? tracing_record_taskinfo+0x10/0xc0
[ 64.092340] ? idr_alloc+0x76/0xc0
[ 64.092951] ? idr_alloc_u32+0x1a0/0x1a0
[ 64.093632] ? ucma_process_join+0x23d/0x460
[ 64.094510] ucma_process_join+0x23d/0x460
[ 64.095199] ? ucma_migrate_id+0x440/0x440
[ 64.095696] ? futex_wake+0x10b/0x2a0
[ 64.096159] ucma_join_multicast+0x88/0xe0
[ 64.096660] ? ucma_process_join+0x460/0x460
[ 64.097540] ? _copy_from_user+0x5e/0x90
[ 64.098017] ucma_write+0x174/0x1f0
[ 64.098640] ? ucma_resolve_route+0xf0/0xf0
[ 64.099343] ? rb_erase_cached+0x6c7/0x7f0
[ 64.099839] __vfs_write+0xc4/0x350
[ 64.100622] ? perf_syscall_enter+0xe4/0x5f0
[ 64.101335] ? kernel_read+0xa0/0xa0
[ 64.103525] ? perf_sched_cb_inc+0xc0/0xc0
[ 64.105510] ? syscall_exit_register+0x2a0/0x2a0
[ 64.107359] ? __switch_to+0x351/0x640
[ 64.109285] ? fsnotify+0x899/0x8f0
[ 64.111610] ? fsnotify_unmount_inodes+0x170/0x170
[ 64.113876] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 64.115813] ? ring_buffer_record_is_on+0xd/0x20
[ 64.117824] ? __fget+0xa8/0xf0
[ 64.119869] vfs_write+0xf7/0x280
[ 64.122001] SyS_write+0xa1/0x120
[ 64.124213] ? SyS_read+0x120/0x120
[ 64.126644] ? SyS_read+0x120/0x120
[ 64.128563] do_syscall_64+0xeb/0x250
[ 64.130732] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 64.132984] RIP: 0033:0x7f5c994ade99
[ 64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[ 64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[ 64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[ 64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[ 64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[ 64.151060]
[ 64.153703] Disabling lock debugging due to kernel taint
[ 64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 64.159066] IP: rdma_join_multicast+0x26e/0x12c0
[ 64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0
[ 64.164442] Oops: 0000 [#1] SMP KASAN PTI
[ 64.166817] CPU: 1 PID: 691 Comm: join Tainted: G B 4.16.0-rc1-00219-gb97853b65b93 #23
[ 64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[ 64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0
[ 64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282
[ 64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522
[ 64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297
[ 64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7
[ 64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000
[ 64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400
[ 64.196105] FS: 00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[ 64.199211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0
[ 64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 64.211554] Call Trace:
[ 64.213464] ? rdma_disconnect+0xf0/0xf0
[ 64.216124] ? __radix_tree_replace+0xc3/0x110
[ 64.219337] ? node_tag_clear+0x81/0xb0
[ 64.222140] ? idr_alloc_u32+0x12e/0x1a0
[ 64.224422] ? __fprop_inc_percpu_max+0x150/0x150
[ 64.226588] ? tracing_record_taskinfo+0x10/0xc0
[ 64.229763] ? idr_alloc+0x76/0xc0
[ 64.232186] ? idr_alloc_u32+0x1a0/0x1a0
[ 64.234505] ? ucma_process_join+0x23d/0x460
[ 64.237024] ucma_process_join+0x23d/0x460
[ 64.240076] ? ucma_migrate_id+0x440/0x440
[ 64.243284] ? futex_wake+0x10b/0x2a0
[ 64.245302] ucma_join_multicast+0x88/0xe0
[ 64.247783] ? ucma_process_join+0x460/0x460
[ 64.250841] ? _copy_from_user+0x5e/0x90
[ 64.253878] ucma_write+0x174/0x1f0
[ 64.257008] ? ucma_resolve_route+0xf0/0xf0
[ 64.259877] ? rb_erase_cached+0x6c7/0x7f0
[ 64.262746] __vfs_write+0xc4/0x350
[ 64.265537] ? perf_syscall_enter+0xe4/0x5f0
[ 64.267792] ? kernel_read+0xa0/0xa0
[ 64.270358] ? perf_sched_cb_inc+0xc0/0xc0
[ 64.272575] ? syscall_exit_register+0x2a0/0x2a0
[ 64.275367] ? __switch_to+0x351/0x640
[ 64.277700] ? fsnotify+0x899/0x8f0
[ 64.280530] ? fsnotify_unmount_inodes+0x170/0x170
[ 64.283156] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 64.286182] ? ring_buffer_record_is_on+0xd/0x20
[ 64.288749] ? __fget+0xa8/0xf0
[ 64.291136] vfs_write+0xf7/0x280
[ 64.292972] SyS_write+0xa1/0x120
[ 64.294965] ? SyS_read+0x120/0x120
[ 64.297474] ? SyS_read+0x120/0x120
[ 64.299751] do_syscall_64+0xeb/0x250
[ 64.301826] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 64.304352] RIP: 0033:0x7f5c994ade99
[ 64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[ 64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[ 64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[ 64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[ 64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[ 64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8
[ 64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860
[ 64.332979] CR2: 00000000000000b0
[ 64.335550] ---[ end trace 0c00c17a408849c1 ]---

Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com>
Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 9dea9a2f 12-Mar-2018 Tatyana Nikolova <tatyana.e.nikolova@intel.com>

RDMA/core: Do not use invalid destination in determining port reuse

cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.

Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.

Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 00313983 01-Mar-2018 Steve Wise <larrystevenwise@gmail.com>

RDMA/nldev: provide detailed CM_ID information

Implement RDMA nldev netlink interface to get detailed CM_ID information.

Because cm_id's are attached to rdma devices in various work queue
contexts, the pid and task information at restrak_add() time is sometimes
not useful. For example, an nvme/f host connection cm_id ends up being
bound to a device in a work queue context and the resulting pid at attach
time no longer exists after connection setup. So instead we mark all
cm_id's created via the rdma_ucm as "user", and all others as "kernel".
This required tweaking the restrack code a little. It also required
wrapping some rdma_cm functions to allow passing the module name string.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a3b641af 01-Mar-2018 Steve Wise <larrystevenwise@gmail.com>

RDMA/CM: move rdma_id_private to cma_priv.h

Move struct rdma_id_private to a new header cma_priv.h so the resource
tracking services in core/nldev.c can read useful information about cm_ids.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b75cc8f9 02-Mar-2018 David Ahern <dsahern@gmail.com>

net/ipv6: Pass skb to route lookup

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25354866 26-Feb-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert cma_pernet_operations

These pernet_operations just create and destroy IDR.
So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3cd96fdd 28-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use existing netif_is_bond_master function

When checking whatever the current netdev is the bond master interface,
use kernel API netif_is_bond_master() instead of hardcoding the check.
No functionality is changed.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 052eac6e 09-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Update RoCE multicast routines to use net namespace

rdma_dev_addr contains the net namespace pointer, while referring
bound_dev_if of the rdma_dev_addr, refer to the net namespace of
rdma_cm_id stored in rdma_dev_addr.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 66c74d74 09-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Update cma_validate_port to honor net namespace

cma_validate_port uses rdma_dev_addr to validate the port of the cm_id.
It needs to honor the net namespace which is setup during cm_id creation
when finding netdevice.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2493a57b 09-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Refactor to access multiple fields of rdma_dev_addr

Pass the rdma_cm_id so that multiple fields of the rdma_dev_addr
structure can be accessed, instead of passing each individual fields.

This is needed to access some additional fields in followup patches.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 00db63c1 09-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Check existence of netdevice during port validation

If valid netdevice is not found for RoCE, GID table should not be
searched with NULL netdevice.

Doing so causes the search routines to ignore the netdev argument and may
match the wrong GID table entry if the netdev is deleted.

Fixes: abae1b71dd37 ("IB/cma: cma_validate_port should verify the port and netdevice")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 411460ac 18-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Introduce API to read GIDs for multiple transports

This patch introduces an API that allows legacy applications to query
GIDs for a rdma_cm_id which is used during connection establishment.

GIDs are stored and created differently for iWarp, IB and RoCE transports.
Therefore rdma_read_gids() returns GID for all the transports hiding
such internal details to caller.
It is usable for client side and server side connections.

In general continued use of GID based addressing outside of IB is
discouraged, so rdma_read_gids() should not be used by any new ULPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 8d20a1f0 08-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Fix rdma_cm raw IB path setting for RoCE

rdma_set_ib_path() missed setting path record fields for RoCE
transport when RoCE support was added.

This results in setting incorrect ndev, destination mac address,
incorrect GID type etc errors when user space attempts to set a raw
IB path using the roce IB path compatibility mapping from userspace.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# fe75889f 08-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths

Since 2006 there has been no user of rdmacm based application to make use
of setting multiple path records using rdma_set_ib_paths API.

Therefore code is simplified to allow setting one path record entry.
Now that it sets only single path, it is renamed to reflect the same.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 9327c7af 08-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Provide a function to set RoCE path record L2 parameters

Introduce a helper function to set path record L2 fields for RoCE.
This includes setting GID type, destination mac address and netdev
ifindex.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 608bc446 08-Jan-2018 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use the right net namespace for the rdma_cm_id

The net namespace is set in addr during create_rdma_id(),
cma_resolve_iboe_route() should use that instead of the
init namespace.

The original code was added in commit fa20105e09e9 ("IB/cma: Add support
for network namespaces"), but this path wasn't in use back then.

This patch updates the code to use right namespace, as preparation
for improving namespace support.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e48e5e19 01-Jan-2018 Leon Romanovsky <leon@kernel.org>

RDMA/cma: Mark end of CMA ID messages

The commit 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr")
removes nlmsg_len calculation in ibnl_put_attr causing netlink messages and
caused to miss source and destination addresses.

Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 4ad6a024 14-Nov-2017 Parav Pandit <parav@mellanox.com>

IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_path

Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 575c7e58 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/{core, cma}: Simplify rdma_translate_ip

Since no caller needs vlan, rdma_translate_ip is simplified to avoid
vlan pointer.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 981b5a23 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/cma: Introduce and use helper functions to init work

Introduce and user helper functions to initialize work for address
resolved and route resolved event that avoid code duplication at few
places.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c4238805 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/cma: Avoid setting path record type twice

Avoid setting path record type twice for RoCE.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 4367ec7f 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/cma: Simplify netdev check

Current code checks for NULL ndev twice where 2nd check is always
invalid given the fact that during route resolving stage, device address
must be bound to netdevice interface.

This patch simplifies such check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5c181bda 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/cma: Set default GID type as RoCE when resolving RoCE route

As the function name suggests cma_resolve_iboe_route() resolves RoCE
route. However, its default GID type is IB_GID_TYPE_IB and not
IB_GID_TYPE_ROCE, even though both are mapped to the same enum value.
Change default GID type to IB_GID_TYPE_ROCE.

cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 7baaa49a 14-Nov-2017 Parav Pandit <parav@mellanox.com>

RDMA/cma: Use correct size when writing netlink stats

The code was using the src size when formatting the dst. They are almost
certainly the same value but it reads wrong.

Fixes: ce117ffac2e9 ("RDMA/cma: Export AF_IB statistics")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d0e312fe 05-Dec-2017 Leon Romanovsky <leon@kernel.org>

RDMA/netlink: Fix general protection fault

The RDMA netlink core code checks validity of messages by ensuring
that type and operand are in range. It works well for almost all
clients except NLDEV, which has cb_table less than number of operands.

Request to access such operand will trigger the following kernel panic.

This patch updates all places where cb_table is declared for the
consistency, but only NLDEV is actually need it.

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 0 PID: 522 Comm: syz-executor6 Not tainted 4.13.0+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
task: ffff8800657799c0 task.stack: ffff8800695d000
RIP: 0010:rdma_nl_rcv_msg+0x13a/0x4c0
RSP: 0018:ffff8800695d7838 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: 1ffff1000d2baf0b RCX: 00000000704ff4d7
RDX: 0000000000000000 RSI: ffffffff81ddb03c RDI: 00000003827fa6bc
RBP: ffff8800695d7900 R08: ffffffff82ec0578 R09: 0000000000000000
R10: ffff8800695d7900 R11: 0000000000000001 R12: 000000000000001c
R13: ffff880069d31e00 R14: 00000000ffffffff R15: ffff880069d357c0
FS: 00007fee6acb8700(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000201a9000 CR3: 0000000059766000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? rdma_nl_multicast+0x80/0x80
rdma_nl_rcv+0x36b/0x4d0
? ibnl_put_attr+0xc0/0xc0
netlink_unicast+0x4bd/0x6d0
? netlink_sendskb+0x50/0x50
? drop_futex_key_refs.isra.4+0x68/0xb0
netlink_sendmsg+0x9ab/0xbd0
? nlmsg_notify+0x140/0x140
? wake_up_q+0xa1/0xf0
? drop_futex_key_refs.isra.4+0x68/0xb0
sock_sendmsg+0x88/0xd0
sock_write_iter+0x228/0x3c0
? sock_sendmsg+0xd0/0xd0
? do_futex+0x3e5/0xb20
? iov_iter_init+0xaf/0x1d0
__vfs_write+0x46e/0x640
? sched_clock_cpu+0x1b/0x190
? __vfs_read+0x620/0x620
? __fget+0x23a/0x390
? rw_verify_area+0xca/0x290
vfs_write+0x192/0x490
SyS_write+0xde/0x1c0
? SyS_read+0x1c0/0x1c0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fee6a74a219
RSP: 002b:00007fee6acb7d58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000638000 RCX: 00007fee6a74a219
RDX: 0000000000000078 RSI: 0000000020141000 RDI: 0000000000000006
RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: ffff8800695d7f98
R13: 0000000020141000 R14: 0000000000000006 R15: 00000000ffffffff
Code: d6 48 b8 00 00 00 00 00 fc ff df 66 41 81 e4 ff 03 44 8d 72 ff 4a 8d 3c b5 c0 a6 7f 82 44 89 b5 4c ff ff ff 48 89 f9 48 c1 e9 03 <0f> b6 0c 01 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
RIP: rdma_nl_rcv_msg+0x13a/0x4c0 RSP: ffff8800695d7838
---[ end trace ba085d123959c8ec ]---
Kernel panic - not syncing: Fatal exception

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: b4c598a67ea1 ("RDMA/netlink: Implement nldev device dumpit calback")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>


# 23a9cd2a 26-Nov-2017 Moni Shoua <monis@mellanox.com>

RDMA/cma: Make sure that PSN is not over max allowed

This patch limits the initial value for PSN to 24 bits as
spec requires.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e08ce2e8 07-Nov-2017 Yuval Shaia <yuval.shaia@oracle.com>

RDMA/core: Make function rdma_copy_addr return void

Function returns zero - make it void.

While there make struct net_device const.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c0b64f58 11-Oct-2017 Bart Van Assche <bvanassche@acm.org>

RDMA/cma: Avoid triggering undefined behavior

According to the C standard the behavior of computations with
integer operands is as follows:
* A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one
greater than the largest value that can be represented by the
resulting type.
* The behavior for signed integer underflow and overflow is
undefined.

Hence only use unsigned integers when checking for integer
overflow.

This patch is what I came up with after having analyzed the
following smatch warnings:

drivers/infiniband/core/cma.c:3448: cma_resolve_ib_udp() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'
drivers/infiniband/core/cma.c:3505: cma_connect_ib() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5ab2d89b 17-Aug-2017 Leon Romanovsky <leon@kernel.org>

IB/cma: Fix erroneous validation of supported default GID type

When rdma_cm is initializing a cma_device it checks if this device
supports the preferred default GID type. This check was done in a wrong way
and therefore sometimes rdma_cm is coming up with default GID type that is
not supported by the device.

Fix that by checking for supported GID type properly.

Fixes: 3c7f67d1880d ("IB/cma: Fix default RoCE type setting")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# e3bf14bd 14-Aug-2017 Jason Gunthorpe <jgg@ziepe.ca>

rdma: Autoload netlink client modules

If a message comes in and we do not have the client in the table, then
try to load the module supplying that client using MODULE_ALIAS to find
it.

This duplicates the scheme seen in other netlink muxes (eg nfnetlink).

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3250b4db 19-Jun-2017 Leon Romanovsky <leon@kernel.org>

RDMA/netlink: Rename netlink callback struct

The RDMA netlink client infrastructure was removed and made obsolete.
The old infrastructure defined struct ibnl_client_cbs. Now that all
uses of this have been updated to the new infrastructure, rename the
struct to be compliant with the current stack naming standards:
struct rdma_nl_cbs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>


# 64401b69 30-May-2017 Leon Romanovsky <leon@kernel.org>

RDMA/netlink: Remove redundant owner option for netlink callbacks

Owner field is not needed to be set because netlink is part of ib_core
which will be unloaded last after all other modules are unloaded.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# c9901724 05-Jun-2017 Leon Romanovsky <leon@kernel.org>

RDMA/netlink: Remove netlink clients infrastructure

RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>


# 3c7f67d1 28-Jul-2017 Doug Ledford <dledford@redhat.com>

IB/cma: Fix default RoCE type setting

The initial patch for changing the stack to use RoCEv2 GIDs by default
set the CMA_PREFERRED_ROCE_GID_TYPE to an incorrect value. Instead of
an absolute value, we needed to set the right bit in a bitmask. Correct
the default setting so we use RoCEv2 by default.

Fixes: 63a5f483af0e (IB/cma: Set default gid type to RoCEv2)
Signed-off-by: Doug Ledford <dledford@redhat.com>


# be1d325a 12-Jun-2017 Noa Osherovich <noaos@mellanox.com>

IB/core: Set RoCEv2 MGID according to spec

RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding
IPv4 address is encoded into the GID according to the following rule:
GID= :ffff:<IPv4 address>

Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it
zeroed and change rdma_is_multicast_addr() to consider the new logic.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 63a5f483 30-May-2017 Moni Shoua <monis@mellanox.com>

IB/cma: Set default gid type to RoCEv2

RoCEv2 is the preferred RDMA protocol for Ethernet link layer because
of its advantages over RoCEv1. For better user experience make it the
default choice for RDMA_CM connections if device/port support it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a62ab66b 14-Jul-2017 Ismail, Mustafa <mustafa.ismail@intel.com>

RDMA/core: Initialize port_num in qp_attr

Initialize the port_num for iWARP in rdma_init_qp_attr.

Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds")
Cc: <stable@vger.kernel.org> # v2.6.14+
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cbd09aeb 23-May-2017 Moni Shoua <monis@mellanox.com>

IB/core: Don't resolve IP address to the loopback device

When resolving an IP address that is on the host of the caller the
result from querying the routing table is the loopback device. This is
not a valid response, because it doesn't represent the RDMA device and
the port.

Therefore, callers need to check the resolved device and if it is a
loopback device find an alternative way to resolve it. To avoid this we
make sure that the response from rdma_resolve_ip() will not be the
loopback device.

While that, we fix an static checker warning about dereferencing an
unintitialized pointer using the same solution as in commit abeffce90c7f
("net/mlx5e: Fix a -Wmaybe-uninitialized warning") as a reference.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d3957b86 21-May-2017 Majd Dibbiny <majd@mellanox.com>

RDMA/SA: Fix kernel panic in CMA request handler flow

Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.

This caused to the following kernel panic in the CMA request handler flow:

[ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[ 27.075979] Call Trace:
[ 27.076015] radix_tree_lookup+0xd/0x10
[ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
[ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
[ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
[ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
[ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
[ 27.076430] ? sched_clock_cpu+0x11/0xb0
[ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
[ 27.076525] process_one_work+0x160/0x410
[ 27.076565] worker_thread+0x137/0x4a0
[ 27.076614] kthread+0x112/0x150
[ 27.076684] ? max_active_store+0x60/0x60
[ 27.077642] ? kthread_park+0x90/0x90
[ 27.078530] ret_from_fork+0x2c/0x40

This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.

Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.

Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
ROCE specific fields)
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 4c33bd19 27-Apr-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/SA: Add support to query OPA path records

When the bit 26 of capmask2 field in OPA classport info
query is set, SA will query for OPA path records instead
of querying for IB path records. Note that OPA
path records can only be queried by kernel ULPs.
Userspace clients continue to query IB path records.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 9fdca4da 27-Apr-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields

sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce
based on the type of the path record. Note that fields applicable to
path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce.
Accessor functions are added to these fields so the caller doesn't have
to know the type.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# dfa834e1 27-Apr-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/SA: Introduce path record specific types

struct sa_path_rec has a gid_type field. This patch introduces a more
generic path record specific type 'rec_type' which is either IB, ROCE v1
or ROCE v2. The patch also provides conversion functions to get
a gid type from a path record type and vice versa

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c2f8fc4e 27-Apr-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/SA: Rename ib_sa_path_rec to sa_path_rec

Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d8966fcd 29-Apr-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/core: Use rdma_ah_attr accessor functions

Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ee1c60b1 20-Mar-2017 Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

IB/SA: Modify SA to implicitly cache Class Port info

SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 61c0ddbe 15-Apr-2017 Moni Shoua <monis@mellanox.com>

IB/cma: Send MRA for reply messages

Current implementation of RDMA_CM sends MRA (Message Receipt
Acknowledgment) only for request messages but not for response messages.

As a result, a slow active side of the connection may send a ready-to-use
message to the passive side in a delay that is too long for the passive
side to wait for.

This patch adds a call to ib_send_cm_mra() upon receiving a response
message and by this tells the other side to modify the service timeout
to a bigger value, 16 times than before. As in the request case, MRA
for reply will be sent only if a duplicate response has arrived.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Matan Barak <matan@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# f2625f7d 21-Feb-2017 Steve Wise <larrystevenwise@gmail.com>

rdma_cm: fail iwarp accepts w/o connection params

cma_accept_iw() needs to return an error if conn_params is NULL.
Since this is coming from user space, we can crash.

Reported-by: Shaobo He <shaobo@cs.utah.edu>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6df6b4a9 13-Feb-2017 Moni Shoua <monis@mellanox.com>

IB/cma: Destination and source addr families must match

The destination address in a listening rdma_id does not have an address
family. Since address family in both sides of a connection must be the
same in rdma_bind_addr() we set the address family of the destination to
the address family of the source.

This patch serves the logic in cma_port_is_unique() which requires to
know if destination address that is associated with a rdma_id is any address
(cma_zero_addr() and cma_loopback_addr()).

This can happen when port reuse is checked for a port number
that is being listened to.

Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 89052d78 13-Feb-2017 Majd Dibbiny <majd@mellanox.com>

IB/cma: Add default RoCE TOS to CMA configfs

Add new entry to the RDMA-CM configfs that allows users
to select default TOS for RDMA-CM QPs.

This is useful for users that want to control the TOS for legacy
applications without changing their code.

Application that sets the TOS explicitly using the rdma_set_option
API will continue to work as expected, meaning overriding the configfs
value.

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d0d7b10b 04-Feb-2017 Parav Pandit <parav@mellanox.com>

net-next: treewide use is_vlan_dev() helper function.

This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24dc831b 25-Jan-2017 Yuval Shaia <yuval.shaia@oracle.com>

IB/core: Add inline function to validate port

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a3dd3a48 27-Jan-2017 Christophe Jaillet <christophe.jaillet@wanadoo.fr>

IB/cma: Fix reversed test

This test looks reverted.
We should log an error message only if 'ib_attach_mcast()' fails.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b4cfe397 15-Jan-2017 Jack Morgenstein <jackm@dev.mellanox.co.il>

RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled

If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.

This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.

In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().

Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution")
Cc: <stable@vger.kernel.org> # v4.2+
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 19b752a1 05-Jan-2017 Moni Shoua <monis@mellanox.com>

IB/cma: Allow port reuse for rdma_id

When allocating a port number for binding to a rdma_id, assuming the
allocation is not for a specific port, the rule is to allow only ports
that were not in use before by any other rdma_id.

This condition is too strong to achieve the goal of a unique 5 tuple
rdma_id. Instead, we can compare current rdma_id with other rdma_id for
difference in one of destination port, source address and destination
address to allow port reuse.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 498683c6 05-Jan-2017 Moni Shoua <monis@mellanox.com>

IB/cma: Add debug messages to error flows

Print debug messages to the kernel log to add more
information about RDMA_CM events that indicate an error.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 102c5ce0 02-Jan-2017 Jack Wang <jinpu.wang@profitbricks.com>

RDMA/cma: use cached port state when bind loopback

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 93b1f29d 02-Jan-2017 Jack Wang <jinpu.wang@profitbricks.com>

RDMA/cma: resolve to first active ib port

When we try to resolve a dest addr, if we don't give src addr,
cma core will try to resolve to our source ib device automatically.
The current logic only checks if a given port has the same
subnet_prefix as our dest, which is not enough if we use default
well known subnet_prefix on our active port, as it will be the same
as the subnet_prefix on inactive ports and we might match against
an inactive port by accident. To resolve this, we should also check
if port is active before we resolve it as a suitable src address for
a given dest.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5f244104 26-Oct-2016 Steve Wise <larrystevenwise@gmail.com>

rdma_cm: add rdma_consumer_reject_data helper function

rdma_consumer_reject_data() will return the private data pointer
and length if any is available.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5042a73d 26-Oct-2016 Steve Wise <larrystevenwise@gmail.com>

rdma_cm: add rdma_is_consumer_reject() helper function

Return true if the peer consumer application rejected the
connection attempt.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 77a5db13 26-Oct-2016 Steve Wise <larrystevenwise@gmail.com>

rdma_cm: add rdma_reject_msg() helper function

rdma_reject_msg() returns a pointer to a string message associated with
the transport reject reason codes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c7d03a00 16-Nov-2016 Alexey Dobriyan <adobriyan@gmail.com>

netns: make struct pernet_operations::id unsigned int

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aeb76df4 30-Oct-2016 Leon Romanovsky <leon@kernel.org>

IB/core: Set routable RoCE gid type for ipv4/ipv6 networks

On Thu, Oct 27, 2016 at 04:36:28PM +0300, Leon Romanovsky wrote:
> From: Mark Bloch <markb@mellanox.com>
>
> If the underlying netowrk type is ipv4 or ipv6 and the device supports
> routable RoCE, prefer it so the traffic could cross subnets.
>
> Signed-off-by: Mark Bloch <markb@mellanox.com>
> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> ---

Hi Doug,

Please take the following v1 of this patch where I fixed spelling error
from "netowrk" to be "network".

Thanks.

>From 09f96ba3e9b4442cfb44dca04c6726e55525c9c3 Mon Sep 17 00:00:00 2001
From: Mark Bloch <markb@mellanox.com>
Date: Sun, 11 Sep 2016 06:25:10 +0000
Subject: [PATCH rdma-rc v1 3/6] IB/core: Set routable RoCE gid type for ipv4/ipv6
networks

If the underlying network type is ipv4 or ipv6 and the device supports
routable RoCE, prefer it so the traffic could cross subnets.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c50e90d0 10-Nov-2016 Arnd Bergmann <arnd@arndb.de>

infiniband: shut up a maybe-uninitialized warning

Some configurations produce this harmless warning when built with gcc
-Wmaybe-uninitialized:

infiniband/core/cma.c: In function 'cma_get_net_dev':
infiniband/core/cma.c:1242:12: warning: 'src_addr_storage.sin_addr.s_addr' may be used uninitialized in this function [-Wmaybe-uninitialized]

I previously reported this for the powerpc64 defconfig, but have now
reproduced the same thing for x86 as well, using gcc-5 or higher.

The code looks correct to me, and this change just rearranges it by
making sure we alway initialize the entire address structure to make the
warning disappear. My first approach added an initialization at the
time of the declaration, which Doug commented may be too costly, so I
hope this version doesn't add overhead.

Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.7-rc6/buildall.powerpc.ppc64_defconfig.log.passed
Link: https://patchwork.kernel.org/patch/9212825/
Acked-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dee9acbb 15-Aug-2016 Bhaktipriya Shridhar <bhaktipriya96@gmail.com>

IB/cma: Remove deprecated create_singlethread_workqueue

alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.

The workqueue "cma_wq" queues work item cma_work_handler. It has been
identity converted.

WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 23d70503 05-Aug-2016 Wei Yongjun <weiyj.lk@gmail.com>

IB/core: Fix possible memory leak in cma_resolve_iboe_route()

'work' and 'route->path_rec' are malloced in cma_resolve_iboe_route()
and should be freed before leaving from the error handling cases,
otherwise it will cause memory leak.

Fixes: 200298326b27 ('IB/core: Validate route when we init ah')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ab15c95a 06-Jul-2016 Alex Vesker <valex@mellanox.com>

IB/core: Support for CMA multicast join flags

Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:

1. Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
and receive messages from the MCG.

2. Send Only Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
but doesn't receive any messages from the MCG.

IB: Send Only Full Member requires a query of ClassPortInfo
to determine if SM/SA supports this option. If SM/SA
doesn't support Send-Only there will be no join request
sent and an error will be returned.

ETH: When Send Only Full Member is requested no IGMP join
will be sent.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c65f6c5a 22-Jun-2016 Alex Vesker <valex@mellanox.com>

IB/core: Fix RoCE v1 multicast join logic issue

During multicast join of RoCEv1, IGMP join state and max hop limit
were updated incorrectly. IGMP join should be sent and marked as
joined only on RoCEv2 after a successful join. Max hops should be
updated to the hop limit on RoCEv2 regardless of the join state.

Fixes: bee3c3c91865 ('IB/cma: Join and leave multicast groups...')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 37e07cda 10-Jun-2016 Bart Van Assche <bvanassche@acm.org>

IB/cma: Make the code easier to verify

Static source code analysis tools like smatch cannot handle functions
that lock or not lock a mutex depending on the value of the arguments.
Hence inline the function cma_disable_callback(). Additionally, this
patch realizes a small performance optimization by reducing the number of
mutex_lock() and mutex_unlock() calls in the modified functions. With
this patch applied smatch no longer complains about source file cma.c.
Without this patch smatch reports the following for this source file:

drivers/infiniband/core/cma.c:1959: cma_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'.
Locked on: line 1880
line 1959
Unlocked on: line 1941
drivers/infiniband/core/cma.c:2112: iw_conn_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'.
Locked on: line 2048
Unlocked on: line 2112

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 2fa2d4fb 06-May-2016 Mark Bloch <markb@mellanox.com>

IB/core: Fix a potential array overrun in CMA and SA agent

Fix array overrun when going over callback table.
In declaration of callback table, the max size isn't provided and
in registration phase, it is provided.

There is potential scenario where a new operation is added
and it is not supported by current client. The acceptance of
such operation by ib_netlink will cause to array overrun.

Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to netlink_dump_start")
Fixes: b493d91d333e ("iwcm: common code for port mapper")
Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 0691a286 03-May-2016 Christoph Hellwig <hch@lst.de>

IB/cma: pass the port number to ib_create_qp

The new RW API will need this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# aba25a3e 01-Mar-2016 Parav Pandit <pandit.parav@gmail.com>

IB/core: trivial prink cleanup.

1. Replaced printk with appropriate pr_warn, pr_err, pr_info.
2. Removed unnecessary prints around memory allocation failure
which are not required, as reported by the checkpatch script.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 84424a7f 29-Feb-2016 Haggai Eran <haggaie@mellanox.com>

IB/cma: Print warning on different inner and header P_Keys

Commit 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA
CM") added checks for incoming RDMA CM requests that they can be matched to
a netdev based on the P_Key in the BTH of the request. This behavior was
reverted in commit ab3964ad2acf ("IB/cma: Use inner P_Key to determine
netdev"), since the mlx5 and ipath drivers didn't send the correct value
in the BTH P_Key.

Since the ipath driver was removed, and the mlx5 driver can now send GSI
packets on different P_Keys, we could revert the patch to let the rdma_cm
module look on the BTH P_Key when deciding to what netdev a packet belongs.
However, that still breaks compatibility with the older drivers.

Change the behavior to print a warning when receiving a request that has a
different BTH P_Key and inner payload P_Key. In the future, after users
have seen the warnings and upgraded their setups, remove the warning and
block these requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c3efe750 04-Jan-2016 Matan Barak <matanb@mellanox.com>

IB/core: Use hop-limit from IP stack for RoCE

Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 64936773 07-Jan-2016 Matan Barak <matanb@mellanox.com>

IB/cma: Fix RDMA port validation for iWarp

cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.

Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes: abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# bee3c3c9 23-Dec-2015 Moni Shoua <monis@mellanox.com>

IB/cma: Join and leave multicast groups with IGMP

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 045959db 23-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/cma: Add configfs for rdma_cm

Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 218a773f 23-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/rdma_cm: Add wrapper for cma reference count

Currently, cma users can't increase or decrease the cma reference
count. This is necassary when setting cma attributes (like the
default GID type) in order to avoid use-after-free errors.
Adding cma_ref_dev and cma_deref_dev APIs.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 20029832 23-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/core: Validate route when we init ah

In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c865f246 23-Dec-2015 Somnath Kotur <Somnath.Kotur@Avagotech.Com>

IB/core: Add rdma_network_type to wc

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

We choose sgid_index and type from all the matching entries in
RDMA-CM based on hint from the IP stack and we set hop_limit for
the IP packet based on above hint from IP stack.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Somnath Kotur <Somnath.Kotur@Avagotech.Com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cb57bb84 23-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/cm: Use the source GID index type

Previosuly, cm and cma modules supported only IB and RoCE v1 GID type.
In order to support multiple GID types, the gid_type is passed to
cm_init_av_by_path and stored in the path record.

The rdma cm client would use a default GID type that will be saved in
rdma_id_private.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b39ffa1d 23-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/core: Add gid_type to gid attribute

In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.

This implies that gid_type should be added to roce_gid_table meta-data.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# fac51590 21-Dec-2015 Matan Barak <matanb@mellanox.com>

IB/cma: cma_match_net_dev needs to take into account port_num

Previously, cma_match_net_dev called cma_protocol_roce which
tried to verify that the IB device uses RoCE protocol. However,
if rdma_id wasn't bound to a port, then the check would occur
against the first port of the device without regard to whether
that port was even of the same type as the type of port the
incoming packet was received on.

Fix this by passing the port of the request and only checking
against the same port of the device.

Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 86bee4c9 18-Dec-2015 Or Gerlitz <ogerlitz@mellanox.com>

IB/core: Avoid calling ib_query_device

Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d3632493 20-Nov-2015 Bart Van Assche <bvanassche@acm.org>

IB/cma: Add a missing rcu_read_unlock()

Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if
fib_lookup() fails. Detected by sparse. Compile-tested only.

Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac87c2).
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>


# db7489e0 03-Aug-2015 Bart Van Assche <bvanassche@acm.org>

IB/core, cma: Make __attribute_const__ declarations sparse-friendly

Move the __attribute_const__ declarations such that sparse understands
that these apply to the function itself and not to the return type.
This avoids that sparse reports error messages like the following:

drivers/infiniband/core/verbs.c:73:12: error: symbol 'ib_event_msg' redeclared with different type (originally declared at include/rdma/ib_verbs.h:470) - different modifiers

Fixes: 2b1b5b601230 ("IB/core, cma: Nice log-friendly string helpers")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# fa20105e 22-Oct-2015 Guy Shapiro <guysh@mellanox.com>

IB/cma: Add support for network namespaces

Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is
used to populate the network namespace field in rdma_id_private.
rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
of ib_cma, when listening on an ID and when looking for an ID for an
incoming request.
3. Decrementing the reference count for the appropriate network namespace
when calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling
from other modules.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 4be74b42 22-Oct-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Separate port allocation to network namespaces

Keep a struct for each network namespace containing the IDRs for the RDMA
CM port spaces. The struct is created dynamically using the generic_net
mechanism.

This patch is internal infrastructure work for the following patches. In
this patch, init_net is statically used as the network namespace for
the new port-space API.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 565edd1d 22-Oct-2015 Guy Shapiro <guysh@mellanox.com>

IB/addr: Pass network namespace as a parameter

Add network namespace support to the ib_addr module. For that, all the
address resolution and matching should be done using the appropriate
namespace instead of init_net.

This is achieved by:

1. Adding an explicit network namespace argument to exported function that
require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.

In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namespace support is added on more levels.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 10e07f13 15-Oct-2015 Matan Barak <matanb@mellanox.com>

IB/core: Remove smac and vlan id from path record

The GID cache accompanies every GID with attributes.
The GID attributes link the GID with its netdevice, which could be
resolved to smac and vlan id easily. Since we've added the netdevice
(ifindex and net) to the path record, storing the L2 attributes is
duplicated data and hence these attributes are removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5c266b23 15-Oct-2015 Matan Barak <matanb@mellanox.com>

IB/cm: Remove the usage of smac and vid of qp_attr and cm_av

The cm and cma don't need to explicitly handle vlan and smac,
as they are resolved from the GID index now. Removing this
portion of code.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# abae1b71 15-Oct-2015 Matan Barak <matanb@mellanox.com>

IB/cma: cma_validate_port should verify the port and netdevice

Previously, cma_validate_port searched for GIDs in IB cache and then
tried to verify the found port. This could fail when there are
identical GIDs on both ports. In addition, netdevice should be taken
into account when searching the GID table.
Fixing cma_validate_port to search only the relevant port's cache
and netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 55ee3ab2 15-Oct-2015 Matan Barak <matanb@mellanox.com>

IB/core: Add netdev and gid attributes paramteres to cache

Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ab3964ad 20-Oct-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Use inner P_Key to determine netdev

When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers.
However, the mlx5 and ipath drivers are currently unable to send correct
P_Key values in GMP headers. They always send using a single P_Key that is
set during the GSI QP initialization.

Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b3b51f9f 21-Sep-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Potential NULL dereference in cma_id_from_event

If the lookup of a listening ID failed for an AF_IB request, the code
would try to call dev_put() on a NULL net_dev.

Fixes: be688195bd08 ("IB/cma: Fix net_dev reference leak with failed
requests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b8cab5da 06-Oct-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Accept connection without a valid netdev on RoCE

The netdev checks recently added to RDMA CM expect a valid netdev to be
found for both InfiniBand and RoCE, but the code that find a netdev is
only implemented for InfiniBand.

Currently RoCE doesn't provide an API to find the netdev matching a
given set of parameters, so this patch just disables the netdev enforcement
for each incoming connections when the link layer is RoCE.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Reported-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# be688195 27-Aug-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Fix net_dev reference leak with failed requests

When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.

Fixes: 0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 51efe394 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Share ib_cm_ids between rdma_cm_ids

Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 0b3ca768 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Use found net_dev for passive connections

When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# f887f2ac 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Validate routing of incoming requests

Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 4c21b5bc 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Add net_dev and private data checks to RDMA CM

Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# aac978e1 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Helper functions to access port space IDRs

Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 0c505f70 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/cma: Refactor RDMA IP CM private-data parsing code

When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 7c1eb45a 30-Jul-2015 Haggai Eran <haggaie@mellanox.com>

IB/core: lock client data with lists_rwsem

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6c26a771 13-Aug-2015 Spencer Baugh <sbaugh@catern.com>

RDMA/cma: fix IPv6 address resolution

Resolving a link-local IPv6 address with an unspecified source address
was broken by commit 5462eddd7a, which prevented the IPv6 stack from
learning the scope id of the link-local IPv6 address, causing random
failures as the IP stack chose a random link to resolve the address on.

This commit 5462eddd7a made us bail out of cma_check_linklocal early if
the address passed in was not an IPv6 link-local address. On the address
resolution path, the address passed in is the source address; if the
source address is the unspecified address, which is not link-local, we
will bail out early.

This is mostly correct, but if the destination address is a link-local
address, then we will be following a link-local route, and we'll need to
tell the IPv6 stack what the scope id of the destination address is.
This used to be done by last line of cma_check_linklocal, which is
skipped when bailing out early:

dev_addr->bound_dev_if = sin6->sin6_scope_id;

(In cma_bind_addr, the sin6_scope_id of the source address is set to the
sin6_scope_id of the destination address, so this is correct)
This line is required in turn for the following line, L279 of
addr6_resolve, to actually inform the IPv6 stack of the scope id:

fl6.flowi6_oif = addr->bound_dev_if;

Since we can only know we are in this failure case when we have access
to both the source IPv6 address and destination IPv6 address, we have to
deal with this further up the stack. So detect this failure case in
cma_bind_addr, and set bound_dev_if to the destination address scope id
to correct it.

Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 68cdba06 18-May-2015 Steve Wise <larrystevenwise@gmail.com>

RDMA/iw_cm: Export tos field to iwarp providers

rdma-cma/iw_cm: Export tos field to iwarp providers

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c07678bb 19-May-2015 Matthew Finlay <Matt@Mellanox.com>

IB/cma: Fix broken AF_IB UD support

Support for using UD and AF_IB is currently broken. The
IB_CM_SIDR_REQ_RECEIVED message is not handled properly in
cma_save_net_info() and we end up falling into code that will try and
process the request as ipv4/ipv6, which will end up failing.

The resolution is to add a check for the SIDR_REQ and call
cma_save_ib_info() with a NULL path record. Change cma_save_ib_info()
to copy the src sib info from the listen_id when the path record is NULL.

Reported-by: Hari Shankar <Hari.Shankar@netapp.com>
Signed-off-by: Matt Finlay <matt@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5d9fb044 14-May-2015 Ira Weiny <ira.weiny@intel.com>

IB/core: Change rdma_protocol_iboe to roce

After discussion upstream, it was agreed to transition the usage of iboe
in the kernel to roce. This keeps our terminology consistent with what
was finalized in the IBTA Annex 16 and IBTA Annex 17 publications.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 2b1b5b60 18-May-2015 Sagi Grimberg <sagig@mellanox.com>

IB/core, cma: Nice log-friendly string helpers

Some of us keep revisiting the code to decode enumerations that
appear in out logs. Let's borrow the nice logging helpers that
exists in xprtrdma and rds for CMA events, IB events and WC statuses.

Reviewd-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 227128fc 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_eth_ah()

Introduce helper rdma_cap_eth_ah() to help us check if the port of an
IB device support Ethernet Address Handler.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 30a74ef4 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_af_ib()

Introduce helper rdma_cap_af_ib() to help us check if the port of an
IB device support Native Infiniband Address.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a31ad3b0 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_ib_mcast()

Introduce helper rdma_cap_ib_mcast() to help us check if the port of an
IB device support Infiniband Multicast.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# fe53ba2f 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_ib_sa()

Introduce helper rdma_cap_ib_sa() to help us check if the port of an
IB device support Infiniband Subnet Administration.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 04215330 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_iw_cm()

Introduce helper rdma_cap_iw_cm() to help us check if the port of an
IB device support IWARP Communication Manager.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 72219cea 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Use management helper rdma_cap_ib_cm()

Introduce helper rdma_cap_ib_cm() to help us check if the port of an
IB device support Infiniband Communication Manager.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# fef60902 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Reform rest part in IB-core cma

Use raw management helpers to reform rest part in IB-core cma.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 7c11147d 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Reform cma_acquire_dev()

Reform cma_acquire_dev() with management helpers, introduce
cma_validate_port() to make the code more clean.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5c9a5282 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Reform mcast related part in IB-core cma

Use raw management helpers to reform mcast related part in IB-core cma.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c72f2189 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Reform route related part in IB-core cma

Use raw management helpers to reform route related part in IB-core cma.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 21655afc 05-May-2015 Michael Wang <yun.wang@profitbricks.com>

IB/Verbs: Reform cm related part in IB-core cma/ucm

Use raw management helpers to reform cm related part in IB-core cma/ucm.

Few checks focus on the device cm type rather than the port capability,
directly pass port 1 works currently, but can't support mixing cm type
device in future.

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 28521440 20-Apr-2015 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/CMA: Canonize IPv4 on IPV6 sockets properly

When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to
canonize the address family to IPv4, but does not properly process
the listening sockaddr to get the listening port, and does not properly
set the address family of the canonized sockaddr.

Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")

Cc: <stable@vger.kernel.org>
Reported-By: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 30dc5e63 26-Mar-2014 Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>

RDMA/core: Add support for iWARP Port Mapper user space service

This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections. They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports. The iWARP Port
Mapper provides a standard mechanism to accomplish this. Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application. If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space. All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process. Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection. The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client. In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
corresponding to the two iWarp drivers - nes and cxgb4 which use
the IWPM service

4) Enums are added to enumerate the attributes in the netlink
messages, which are exchanged between the user space IWPM service
and the iWARP drivers

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com>

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>


# b2853fd6 27-Mar-2014 Moni Shoua <monis@mellanox.com>

IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler

The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module. It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 5462eddd 15-Nov-2013 Somnath Kotur <somnath.kotur@emulex.com>

RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal()

If addr is not a linklocal address, the code incorrectly fails to
return and ends up assigning the scope ID to the scope id of the
address, which is wrong. Fix by checking if it's a link local address
first, and immediately return 0 if not.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 7b85627b 12-Dec-2013 Moni Shoua <monis@mellanox.com>

IB/cma: IBoE (RoCE) IP-based GID addressing

Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 63862b5b 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>

net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dd5f03be 12-Dec-2013 Matan Barak <matanb@mellanox.com>

IB/core: Ethernet L2 attributes in verbs/cm structures

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 352b9056 10-Nov-2013 Michal Nazarewicz <mina86@mina86.com>

RDMA/cma: Remove unused argument and minor dead code

The dev variable is never assigned after being initialised.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# be9130cc 24-Sep-2013 Doug Ledford <dledford@redhat.com>

IB/cma: Check for GID on listening device first

As a simple optimization that should speed up the vast majority of
connect attemps on IB devices, when we are searching for the GID of an
incoming connection in the cached GID lists of devices, search the
device that received the incoming connection request first. If we
don't find it there, then move on to other devices.

This reduces the time to perform 10,000 connections considerably.
Prior to this patch, a bad run of cmtime would look like this:

connect : 12399.26 12351.10 8609.00 1239.93

With this patch, it looks more like this:

connect : 5864.86 5799.80 8876.00 586.49

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 29f27e84 24-Sep-2013 Doug Ledford <dledford@redhat.com>

IB/cma: Use cached gids

The cma_acquire_dev function was changed by commit 3c86aa70bf67
("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
because multiport devices might have either IB or IBoE formatted gids.
The old function assumed that all ports on the same device used the
same GID format.

However, when it was changed to use find_gid_port(), we inadvertently
lost usage of the GID cache. This turned out to be a very costly
change. In our testing, each iteration through each index of the GID
table takes roughly 35us. When you have multiple devices in a system,
and the GID you are looking for is on one of the later devices, the
code loops through all of the GID indexes on all of the early devices
before it finally succeeds on the target device. This pathological
search behavior combined with 35us per GID table index retrieval
results in results such as the following from the cmtime application
that's part of the latest librdmacm git repo:

ib1:
step total ms max ms min us us / conn
create id : 29.42 0.04 1.00 2.94
bind addr : 186705.66 19.00 18556.00 18670.57
resolve addr : 41.93 9.68 619.00 4.19
resolve route: 486.93 0.48 101.00 48.69
create qp : 4021.95 6.18 330.00 402.20
connect : 68350.39 68588.17 24632.00 6835.04
disconnect : 1460.43 252.65-1862269.00 146.04
destroy : 41.16 0.04 2.00 4.12

ib0:
step total ms max ms min us us / conn
create id : 28.61 0.68 1.00 2.86
bind addr : 2178.86 2.95 201.00 217.89
resolve addr : 51.26 16.85 845.00 5.13
resolve route: 620.08 0.43 92.00 62.01
create qp : 3344.40 6.36 273.00 334.44
connect : 6435.99 6368.53 7844.00 643.60
disconnect : 5095.38 321.90 757.00 509.54
destroy : 37.13 0.02 2.00 3.71

Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system.

After applying this patch, the numbers now look like this:

ib1:
step total ms max ms min us us / conn
create id : 30.15 0.03 1.00 3.01
bind addr : 80.27 0.04 7.00 8.03
resolve addr : 43.02 13.53 589.00 4.30
resolve route: 482.90 0.45 100.00 48.29
create qp : 3986.55 5.80 330.00 398.66
connect : 7141.53 7051.29 5005.00 714.15
disconnect : 5038.85 193.63 918.00 503.88
destroy : 37.02 0.04 2.00 3.70

ib0:
step total ms max ms min us us / conn
create id : 34.27 0.05 1.00 3.43
bind addr : 26.45 0.04 1.00 2.64
resolve addr : 38.25 10.54 760.00 3.82
resolve route: 604.79 0.43 97.00 60.48
create qp : 3314.95 6.34 273.00 331.49
connect : 12399.26 12351.10 8609.00 1239.93
disconnect : 5096.76 270.72 1015.00 509.68
destroy : 37.10 0.03 2.00 3.71

It's worth noting that we still suffer a bit of a penalty on
connect to the wrong device, but the penalty is much less than
it used to be. Follow on patches deal with this penalty.

Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# eb072c4b 06-Nov-2013 Eyal Perry <eyalpe@mellanox.com>

RDMA/cma: Set IBoE SL (user-priority) by egress map when using vlans

On top of commit 366cddb40 "IB/rdma_cm: TOS <=> UP mapping for IBoE", add
support for case vlan egress map is used.

When the IBoE session is being set over a vlan, inherit the socket priority
to vlan priority mapping which was configured for the vlan device egress map.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0bbf87d8 28-Sep-2013 Eric W. Biederman <ebiederm@xmission.com>

net ipv4: Convert ipv4.ip_local_port_range to be per netns v3

- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24d44a39 04-Jul-2013 Steve Wise <larrystevenwise@gmail.com>

RDMA/cma: Add IPv6 support for iWARP

Modify the type of local_addr and remote_addr fields in struct
iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold
IPv6 and IPv4 addresses uniformly.

Change the references of local_addr and remote_addr in cxgb4, cxgb3,
nes and amso drivers to match this. However to be able to actully run
traffic over IPv6, low-level drivers have to add code to support this.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>

[ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set.
- Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>


# 5eb695c1 24-Jul-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Only call cma_save_ib_info() for CM REQs

Calling cma_save_ib_info() for CM SIDR REQs results in a crash
accessing an invalid path record pointer.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# e511d1ae 24-Jul-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix accessing invalid private data for UD

If a application is using AF_IB with a UD QP, but does not provide any
private data, we will end up accessing invalid memory. Check for this
case and handle it appropriately.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 8fb488d7 24-Jul-2013 Paul Bolle <pebolle@tiscali.nl>

RDMA/cma: Fix gcc warning

Building cma.o triggers this gcc warning:

drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’:
drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here

This is a false positive, as "port" will always be initialized if we're
at "found". But if we assign to "id_priv->id.port_num" directly, we can
drop "port". That will, obviously, silence gcc.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 8c6ffba0 14-Jul-2013 Rusty Russell <rusty@rustcorp.com.au>

PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.

Sweep of the simple cases.

Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ce117ffa 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Export AF_IB statistics

Report AF_IB source and destination addresses through netlink
interface.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 5bc2b7b3 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/ucma: Allow user space to specify AF_IB when joining multicast

Allow user space applications to join multicast groups using MGIDs
directly. MGIDs may be passed using AF_IB addresses. Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# cf53936f 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Export cma_get_service_id()

Allow the rdma_ucm to query the IB service ID formed or allocated by
the rdma_cm by exporting the cma_get_service_id() functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 94d0c939 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Only listen on IB devices when using AF_IB

If an rdma_cm_id is bound to AF_IB, with a wild card address, only
listen on IB devices.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 5c438135 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Set qkey for AF_IB

Allow the user to specify the qkey when using AF_IB. The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# e8160e15 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Expose private data when using AF_IB

If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header. Instead, all private data should be exported to the
user. When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header. Additionally, this will be
necessary to support any IB connection through the rdma cm interface,

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# fbaa1a6d 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Merge cma_get/save_net_info

With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 01602f11 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Remove unused SDP related code

The SDP protocol was never merged upstream. Remove unused SDP related
code from the RDMA CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 496ce3ce 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add support for AF_IB to cma_get_service_id()

cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id. Extend the call to support AF_IB,
which contains the service ID directly. This will be needed to
support any arbitrary SID.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# f68194ca 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add support for AF_IB to rdma_resolve_route()

Allow rdma_resolve_route() to handle the case where the user specified
the source and destination addresses using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# f17df3b0 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add support for AF_IB to rdma_resolve_addr()

Allow the user to specify the remote address using AF_IB format. When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done. The local address may still need to be
matched with a local IB device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 4ae7152e 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Verify that source and dest sa_family are the same

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# b0569e40 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Restrict AF_IB loopback to binding to IB devices only

If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# f4753834 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add helper functions to return id address information

Provide inline helpers to extract source and destination address data
from the rdma_cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 6a3e362d 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Do not modify sa_family when setting loopback address

cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port. Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.

As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback. cma_bind_loopback is only invoked when
the source address is the loopback address.

Finally, add loopback support for AF_IB as part of the change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 680f920a 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Allow user to specify AF_IB when binding

Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device. AF_IB indicates that the user is not mapping an IP
address to the native IB addressing. (The mapping may have already
been done, or is not needed)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 58afdcb7 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Update port reservation to support AF_IB

The AF_IB uses a 64-bit service id (SID), which the user can control
through the use of a mask. The rdma_cm will assign values to the
unmasked portions of the SID based on the selected port space and port
number.

Because the IB spec divides the SID range into several regions, a
SID/mask combination may fall into one of the existing port space
ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the
correct RDMA port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# ef560861 29-May-2013 Sean Hefty <sean.hefty@intel.com>

IB/addr: Add AF_IB support to ip_addr_size

Add support for AF_IB to ip_addr_size, and rename the function to
account for the change. Give the compiler more control over whether
the call should be inline or not by moving the definition into the .c
file, removing the static inline, and exporting it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 2e2d190c 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Include AF_IB in loopback and any address checks

Enhance checks for loopback and any address to support AF_IB in
addition to AF_INET and AF_INT6. This will allow future patches to
use AF_IB when binding and resolving addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# c8dea2f9 29-May-2013 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Allow enabling reuseaddr in any state

The rdma_cm only allows setting reuseaddr if the corresponding
rdma_cm_id is in the idle state. Allow setting this value in other
states. This brings the behavior more inline with sockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 351638e7 27-May-2013 Jiri Pirko <jiri@resnulli.us>

net: pass info struct via netdevice notifier

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>


# b67bfe0d 27-Feb-2013 Sasha Levin <sasha.levin@oracle.com>

hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3b069c5d 27-Feb-2013 Tejun Heo <tj@kernel.org>

IB/core: convert to idr_alloc()

Convert to the much saner new idr interface.

v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
which may be used from non-process context, was calling
idr_preload() unconditionally. Preload iff @gfp_mask has
__GFP_WAIT.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 63f05be2 28-Nov-2012 shefty <sean.hefty@intel.com>

RDMA/cm: Change return value from find_gid_port()

Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:

The patch 3c86aa70bf67: "RDMA/cm: Add RDMA CM support for IBoE
devices" from Oct 13, 2010, leads to the following warning:
net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
error: passing non neg 1 to ERR_PTR

This bug would result in a NULL dereference. svc_rdma_create() is
supposed to return ERR_PTRs or valid pointers, but instead it returns
ERR_PTRs, valid pointers and 1.

The call tree is:

svc_rdma_create()
=> rdma_bind_addr()
=> cma_acquire_dev()
=> find_gid_port()

rdma_bind_addr() should return a valid errno. Fix this by having
find_gid_port() also return a valid errno. If we can't find the
specified GID on a given port, return -EADDRNOTAVAIL, rather than
-EAGAIN, to better indicate the error. We also drop using the
special return value of '1' and instead pass through the error
returned by the underlying verbs call. On such errors, rather
than aborting the search, we simply continue to check the next
device/port.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 809d5fc9 04-Oct-2012 Gao feng <gaofeng@cn.fujitsu.com>

infiniband: pass rdma_cm module to netlink_dump_start

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ede178a 03-Oct-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Check that retry count values are in range

The retry_count and rnr_retry_count connection parameters are both
3-bit values. Check that the values are in range and reduce if
they're not.

This fixes a problem reported by Doug Ledford <dledford@redhat.com>
that resulted in the userspace rping test (part of the librdmacm
samples) failing to run over Intel IB HCAs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Use min_t() to avoid warnings about type mismatch. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>


# 2a22fb8c 30-Aug-2012 Dotan Barak <dotanb@dev.mellanox.co.il>

RDMA/cma: Use consistent component mask for IPoIB port space multicast joins

CMA multicast joins for the IPoIB port space need to use the same
component mask used by the ipoib driver. Otherwise, it's possible for
the CMA to create a group to which a join made by ipoib will fail, or
vise-versa. Some of the component mask fields set by ipoib weren't
set by the CMA, fix that.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 4e289045 27-Jul-2012 Fengguang Wu <fengguang.wu@intel.com>

RDMA/cma: Use PTR_RET rather than if (IS_ERR(...)) + PTR_ERR

Suggested by scripts/coccinelle/api/ptr_ret.cocci.

Signed-off-by: Roland Dreier <roland@purestorage.com>


# d90f9b35 05-Jul-2012 Roland Dreier <roland@purestorage.com>

IB: Use IS_ENABLED(CONFIG_IPV6)

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>


# 68602120 14-Jun-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Allow user to restrict listens to bound address family

Provide an option for the user to specify that listens should only
accept connections where the incoming address family matches that of
the locally bound address. This is used to support the equivalent of
IPV6_V6ONLY socket option, which allows an app to only accept
connection requests directed to IPv6 addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 406b6a25 14-Jun-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Listen on specific address family

The rdma_cm maps IPv4 and IPv6 addresses to the same service ID. This
prevents apps from listening only for IPv4 or IPv6 addresses. It also
results in an app binding to an IPv4 address receiving connection
requests for an IPv6 address.

Change this to match socket behavior: restrict listens on IPv4
addresses to only IPv4 addresses, and if a listen is on an IPv6
address, allow it to receive either IPv4 or IPv6 addresses, based on
its address family binding.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 5b0ec991 14-Jun-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Bind to a specific address family

The RDMA CM uses a single port space for all associated (tcp, udp,
etc.) port bindings, regardless of the address family that the user
binds to. The result is that if a user binds to AF_INET, but does not
specify an IP address, the bind will occur for AF_INET6. This causes
an attempt to bind to the same port using AF_INET6 to fail, and
connection requests to AF_INET6 will match with the AF_INET listener.
Align the behavior with sockets and restrict the bind to AF_INET only.

If a user binds to AF_INET6, we bind the port to AF_INET6 and
AF_INET depending on the value of bindv6only.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 4dd81e89 14-Jun-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: QP type check on received REQs should be AND not OR

Change || check to the intended && when checking the QP type in a
received connection request against the listening endpoint.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# b6cec8aa 25-Apr-2012 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix lockdep false positive recursive locking

The following lockdep problem was reported by Or Gerlitz <ogerlitz@mellanox.com>:

[ INFO: possible recursive locking detected ]
3.3.0-32035-g1b2649e-dirty #4 Not tainted
---------------------------------------------
kworker/5:1/418 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i d+0x33/0x1f0 [rdma_cm]

but task is already holding lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca llback+0x24/0x45 [rdma_cm]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/5:1/418:
#0: (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a 6
#1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on e_work+0x210/0x4a6
#2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab le_callback+0x24/0x45 [rdma_cm]

stack backtrace:
Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4
Call Trace:
[<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204
[<ffffffff81068771>] __lock_acquire+0x16b5/0x174e
[<ffffffff8106461f>] ? save_trace+0x3f/0xb3
[<ffffffff810688fa>] lock_acquire+0xf0/0x116
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf
[<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm]
[<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm]
[<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm]
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6
[<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6
[<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff8104316e>] worker_thread+0x1d6/0x350
[<ffffffff81042f98>] ? rescuer_thread+0x241/0x241
[<ffffffff81046a32>] kthread+0x84/0x8c
[<ffffffff8136e854>] kernel_thread_helper+0x4/0x10
[<ffffffff81366d59>] ? retint_restore_args+0xe/0xe
[<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56
[<ffffffff8136e850>] ? gs_change+0xb/0xb

The actual locking is fine, since we're dealing with different locks,
but from the same lock class. cma_disable_callback() acquires the
listening id mutex, whereas rdma_destroy_id() acquires the mutex for
the new connection id. To fix this, delay the call to
rdma_destroy_id() until we've released the listening id mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 366cddb4 04-Apr-2012 Amir Vadai <amirv@mellanox.com>

IB/rdma_cm: TOS <=> UP mapping for IBoE

Both tagged traffic and untagged traffic use tc tool mapping.
Treat RDMA TOS same as IP TOS when mapping to SL

Signed-off-by: Amir Vadai <amirv@mellanox.com>
CC: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46ea5061 06-Dec-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix endianness bugs

Fix endianness bugs reported by sparse in the RDMA core stack. Note
that these are real bugs, but don't affect any existing code to the
best of my knowledge. The mlid issue would only affect kernel users
of rdma_join_multicast which have the rdma_cm attach/detach its QP.
There are no current in tree users that do this. (rdma_join_multicast
may be used called by user space applications, which does not have
this issue.) And the pkey setting is simply returned as
informational.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 04ded167 06-Dec-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Verify private data length

private_data_len is defined as a u8. If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0. This
can lead to overwriting random kernel memory. Avoid this by verifying
that the resulting size fits into a u8.

Reported-by: B. Thery <benjamin.thery@bull.net>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 4e3fd7a0 20-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com>

net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e4dd23d7 27-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

infiniband: Fix up module files that need to include module.h

They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 18c441a6 29-May-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Support XRC QPs

Allow users to connect XRC QPs through the rdma_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 2d2e9415 28-May-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cm: Define new RDMA port space specific to IB

Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 3ebeebc3 25-Sep-2011 Kumar Sanghvi <kumaras@chelsio.com>

RDMA/iwcm: Propagate ird/ord values upwards

Update struct iw_cm_event to support propagating the ird/ord values
upwards to the application.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# f45ee80e 06-Oct-2011 Hefty, Sean <sean.hefty@intel.com>

RDMA/cma: Check for NULL conn_param in rdma_accept

Check that conn_param is not null before dereferencing it when
processing rdma_accept(). This is necessary to prevent a possible
system crash, which can be caused by user space.

Problem found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 9595480c 06-Oct-2011 Hefty, Sean <sean.hefty@intel.com>

RDMA/cma: Fix crash in cma_req_handler

The RDMA CM uses the local qp_type to determine how to process an
incoming request. This can result in an incoming REQ being treated as
a SIDR REQ and vice versa. Fix this by switching off the event type
instead, and for good measure verify that the listener supports the
incoming connection request.

This problem showed up when a user space application mismatched the QP
types between a client and server app.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 2efdd6a0 12-Jul-2011 Moni Shoua <monis@mellanox.co.il>

RDMA/cma: Don't allow IPoIB port space for IBoE

This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available. Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 0c9361fc 17-Jul-2011 Jack Morgenstein <jackm@dev.mellanox.co.il>

RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object

Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 83e9502d 13-Jan-2011 Nir Muchtar <nirm@voltaire.com>

RDMA/cma: Save PID of ID's owner

Save the PID associated with an RDMA CM ID for reporting via netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 753f618a 03-Jan-2011 Nir Muchtar <nirm@voltaire.com>

RDMA/cma: Add support for netlink statistics export

Add callbacks and data types for statistics export of all current
devices/ids. The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct. Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
__u32 qp_num;
__u32 bound_dev_if;
__u32 port_space;
__s32 pid;
__u8 cm_state;
__u8 node_type;
__u8 port_num;
__u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# b26f9b99 01-Apr-2010 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Pass QP type into rdma_create_id()

The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 550e5ca7 20-May-2011 Nir Muchtar <nirm@voltaire.com>

RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>

Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# a9bb7912 09-May-2011 Hefty, Sean <sean.hefty@intel.com>

RDMA/cma: Add an ID_REUSEADDR option

Lustre requires that clients bind to a privileged port number before
connecting to a remote server. On larger clusters (typically more
than about 1000 nodes), the number of privileged ports is exhausted,
resulting in lustre being unusable.

To handle this, we add support for reusable addresses to the rdma_cm.
This mimics the behavior of the socket option SO_REUSEADDR. A user
may set an rdma_cm_id to reuse an address before calling
rdma_bind_addr() (explicitly or implicitly). If set, other
rdma_cm_id's may be bound to the same address, provided that they all
have reuse enabled, and there are no active listens.

If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
will only succeed if there are no other id's bound to that same
address. The reuse option is exported to user space. The behavior of
the kernel reuse implementation was verified against that given by
sockets.

This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 43b752da 09-May-2011 Hefty, Sean <sean.hefty@intel.com>

RDMA/cma: Fix handling of IPv6 addressing in cma_use_port

cma_use_port() assumes that the sockaddr is an IPv4 address. Since
IPv6 addressing is supported (and also to support other address
families) make the code more generic in its address handling.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# a396d43a 23-Feb-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one

rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
rdma_cm_id has been bound to a device. This prevents an active
address resolution callback handler from assigning a device to the
rdma_cm_id after rdma_destroy_id checks for one.

Instead, we can replace the use of the global lock around the check to
the rdma_cm_id device pointer by setting the id state to destroying,
then flushing all active callbacks. The latter is accomplished by
acquiring and releasing the handler_mutex. Any active handler will
complete first, and any newly scheduled handlers will find the
rdma_cm_id in an invalid state.

In addition to optimizing the current locking scheme, the use of the
rdma_cm_id mutex is a more intuitive synchronization mechanism than
that of the global lock. These changes are based on feedback from
Doug Ledford <dledford@redhat.com> while he was trying to debug a
crash in the rdma cm destroy path.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 25ae21a1 23-Feb-2011 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix crash in request handlers

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS. The crash has the following call trace:

cm_process_work
cma_req_handler
cma_disable_callback
rdma_create_id
kzalloc
init_completion
cma_get_net_info
cma_save_net_info
cma_any_addr
cma_zero_addr
rdma_translate_ip
rdma_copy_addr
cma_acquire_dev
rdma_addr_get_sgid
ib_find_cached_gid
cma_attach_to_dev
ucma_event_handler
kzalloc
ib_copy_ah_attr_to_user
cma_comp

[ preempted ]

cma_write
copy_from_user
ucma_destroy_id
copy_from_user
_ucma_find_context
ucma_put_ctx
ucma_free_ctx
rdma_destroy_id
cma_exch
cma_cancel_operation
rdma_node_get_transport

rt_mutex_slowunlock
bad_area_nosemaphore
oops_enter

They were able to reproduce the crash multiple times with the
following details:

Crash seems to always happen on the:
mutex_unlock(&conn_id->handler_mutex);
as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers. When a new connection request is received, the rdma_cm
allocates a new connection identifier. This identifier has a single
reference count on it. If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory. However, the request
handlers may still be in the process of running. When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# af7bd463 26-Aug-2010 Eli Cohen <eli@dev.mellanox.co.il>

IB/core: Add VLAN support for IBoE

Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
GID derived from a link local address in the following way:

GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.

The 3 bits user priority field of the packets are identical to the 3
bits of the SL.

In case of rdma_cm apps, the TOS field is used to generate the SL
field by doing a shift right of 5 bits effectively taking to 3 MS bits
of the TOS field.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 3c86aa70 13-Oct-2010 Eli Cohen <eli@dev.mellanox.co.il>

RDMA/cm: Add RDMA CM support for IBoE devices

Add support for IBoE device binding and IP --> GID resolution. Path
resolving and multicast joining are implemented within cma.c by
filling in the responses and running callbacks in the CMA work queue.

IP --> GID resolution always yields IPv6 link local addresses; remote
GIDs are derived from the destination MAC address of the remote port.
Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
(IPv4 multicast is enabled by translating IPv4 multicast addresses to
IPv6 multicast as described in
<http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.)

Some helper functions are added to ib_addr.h.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 9893e742 15-May-2010 Julia Lawall <julia@diku.dk>

IB/core: Use kmemdup() instead of kmalloc()+memcpy()

Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

- to = \(kmalloc\|kzalloc\)(size,flag);
+ to = kmemdup(from,size,flag);
if (to==NULL || ...) S
- memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 5d7220e8 14-Apr-2010 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

RDMA/cma: Randomize local port allocation

Randomize local port allocation in the way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# ae2d9293 25-Mar-2010 Sean Hefty <sean.hefty@intel.com>

RDMA/cm: Set num_paths when manually assigning path records

When manually assigning the path records to use for a connection, save
the number of paths that were set. Otherwise, checks against num_path
will show 0, even though path record data is available.

This was discovered by manually setting the path records from user
space, then querying the kernel to see if the correct path records
were assigned, only to discover that the kernel returned 0 path
records to the query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 8523c048 08-Feb-2010 Sean Hefty <sean.hefty@intel.com>

RDMA/cm: Revert association of an RDMA device when binding to loopback

Revert the following change from commit 6f8372b6 ("RDMA/cm: fix
loopback address support")

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address. (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdma_bind_addr,
no device is associated with the rdma_cm_id. Fix this.

It turns out that important apps such as Open MPI depend on
rdma_bind_addr() NOT associating any RDMA device when binding to a
loopback address. Open MPI is being updated to deal with this, but at
least until a new Open MPI release is available, maintain the previous
behavior: allow rdma_bind_addr() to succeed, but do not bind to a
device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# fd4582a3 06-Jan-2010 Robert P. J. Day <rpjday@crashcourse.ca>

IB/addr: Correct CONFIG_IPv6 to CONFIG_IPV6

Correct misspelled "CONFIG_IPv6" that was introduced in commit
d14714df ("IB/addr: Fix IPv6 routing lookup"). The config variable
should be all uppercase.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>

[ This was my fault when I munged the original patch. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# d14714df 19-Nov-2009 Sean Hefty <sean.hefty@intel.com>

IB/addr: Fix IPv6 routing lookup

Include link scope as part of address resolution. Combine local
and remote address resolution into a single, simpler code path.
Fix error checking in the IPv6 routing lookups.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fix up cma_check_linklocal() for !IPV6 case. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 6f8372b6 19-Nov-2009 Sean Hefty <sean.hefty@intel.com>

RDMA/cm: fix loopback address support

The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address. (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id. Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself. The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails. The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address. And support both IPv4 and
IPv6 address formats.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/ - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c4315d85 19-Nov-2009 Sean Hefty <sean.hefty@intel.com>

IB/addr: Store net_device type instead of translating to RDMA transport

The struct rdma_dev_addr stores net_device address information:
the source device address, destination hardware address, and
broadcast address. For consistency, store the net_device type
rather than converting it to the rdma_node_type.

The type indicates the format of the various hardware addresses,
which is what we're concerned with, and not the RDMA node type
that the address may map to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 6266ed6e 19-Nov-2009 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Replace net_device pointer with index

Provide the device interface when resolving route information to
ensure that the correct outbound device is used. This will also
simplify processing of sin6_scope_id for IPv6 support.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthrope@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# e2e62697 19-Nov-2009 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Fix AF_INET6 support in multicast joining

If joining to an AF_INET6 address, we need to map the address to a MGID
in the same way as the IP stack. The old code would just fall through to
the IPv4 case and generate garbage.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1c9b2819 19-Nov-2009 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cma: Correct detection of SA Created MGID

RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with
FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy;
fix it up.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 716abb1f 23-Jun-2009 Peter Huewe <peterhuewe@gmx.de>

RDMA: Add __init/__exit macros to addr.c and cma.c

Add __init and __exit annotations to the module_init/module_exit
functions from drivers/infiniband/core/addr.c and cma.c.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# d2ca39f2 08-Apr-2009 Yossi Etigin <root@voltaire.com>

RDMA/cma: Create cm id even when IB port is down

When doing rdma_resolve_addr(), if the relevant IB port is down, the
function fails and the cm_id is not bound to the correct device.
Therefore, application does not have a device handle and cannot wait
for the port to become active. The function fails because the
underlying IPoIB interface is not joined to the broadcast group and
therefore the SA does not have a multicast record to take a Q_Key
from.

The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
id_priv->qkey if it was not set, and will be called just before the
Q_Key is really required.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 84adeee9 01-Apr-2009 Yossi Etigin <yosefe@Voltaire.COM>

RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups

When joining an IPoIB multicast group, use the same rate as in the
broadcast group. Otherwise, if the RDMA CM creates this group before
IPoIB does, it might get a different rate. This will cause IPoIB to
fail joining to the same group later on, because IPoIB uses strict
rate selection.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1f5175ad 24-Dec-2008 Aleksey Senin <alekseys@voltaire.com>

RDMA/cma: Add IPv6 support

Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.

Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 3f446754 04-Aug-2008 Roland Dreier <rolandd@cisco.com>

RDMA/cma: Remove padding arrays by using struct sockaddr_storage

There are a few places where the RDMA CM code handles IPv6 by doing

struct sockaddr addr;
u8 pad[sizeof(struct sockaddr_in6) -
sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

struct sockaddr_storage addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
switch to struct sockaddr_storage and get rid of padding arrays in
struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 38ca83a5 22-Jul-2008 Amir Vadai <amirv@mellanox.co.il>

RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event

Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state. Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# dd5bdff8 22-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com>

RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event

Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does. In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# de910bd9 15-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com>

RDMA/cma: Simplify locking needed for serialization of callbacks

The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable. This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 64c5e613 15-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com>

RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr

Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 468f2239 15-Jul-2008 Roland Dreier <rolandd@cisco.com>

RDMA/cma: Add missing newlines to printk()s

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>


# a9474917 15-Jul-2008 Sean Hefty <sean.hefty@intel.com>

RDMA: Fix license text

The license text for several files references a third software license
that was inadvertently copied in. Update the license to what was
intended. This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 10f32065 16-Apr-2008 Julia Lawall <julia@diku.dk>

RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0

The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1b90c137 28-Mar-2008 Al Viro <viro@ftp.linux.org.uk>

trivial endianness annotations: infiniband core

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ead595ae 13-Feb-2008 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Do not issue MRA if user rejects connection request

There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm. (The ib_cm will send an MRA if it receives a duplicate
REQ.) The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received. When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1ab35276 22-Jan-2008 Denis V. Lunev <den@openvz.org>

[NETNS]: Add namespace parameter to ip_dev_find.

in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6360a02a 16-Dec-2007 Joe Perches <joe@perches.com>

[IPV4] drivers/infiniband: Use ipv4_is_<type>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5851bb89 04-Jan-2008 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Override default responder_resources with user value

By default, the responder_resources parameter is set to that received
in a connection request. The passive side may override this value
when accepting the connection. Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request. Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# a9e527e3 10-Dec-2007 Rolf Manderscheid <rvm@obsidianresearch.com>

IPoIB: improve IPv4/IPv6 to IB mcast mapping functions

An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 45d9478d 07-Dec-2007 Vladimir Sokolovsky <vlad@mellanox.co.il>

RDMA/cma: Reenable device removal on passive side

Enable conn_id remove on the passive side after connection
establishment. This corrects an issue where the IB driver can't be
unloaded after running applications over RDS. The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 8d8293cf 29-Oct-2007 Steve Wise <larrystevenwise@gmail.com>

RDMA/iwcm: Set initiator depth and responder resources to device max values

Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# a25de534 18-Oct-2007 Anton Arapov <aarapov@redhat.com>

[INET]: Justification for local port range robustness.

There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
http://marc.info/?l=linux-netdev&m=119206106218187&w=2
http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d02d1f53 09-Oct-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix deadlock destroying listen requests

Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system. The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens. It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id(). It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex. Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed. Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen(). Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c5483388 24-Sep-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add locking around QP accesses

If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.) While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
connection. The remote ends decides to do the same. We start to shut
down the connection, and call rdma_destroy_qp on our cm_id. ... Now
apparently a 'connect reject' message comes in from the other host,
and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
It calls cma_modify_qp_err, which for some odd reason tries to modify
the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 227b60f5 10-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[INET]: local port range robustness

Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dcb3f974 01-Aug-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries

Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment. This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# a81c994d 08-Aug-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add ability to specify type of service

Provide support to specify a type of service for a communication
identifier. A new function call is used when dealing with IPv4
addresses. For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ The comments Eitan Zahavi and myself have made over the v1 post at
<http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
were fully addressed. ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 8f076531 17-Jul-2007 Dotan Barak <dotanb@dev.mellanox.co.il>

RDMA/cma: Remove local write permission from QP access flags

Local write permission makes no sense as part of the QP access flags,
since the access flags only control what the remote end of the
connection is allowed to do. Remove the code in the RDMA CM that
initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1d846126 18-Jun-2007 Sean Hefty <sean.hefty@intel.com>

IB/cm: Include HCA ACK delay in local ACK timeout

The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs. If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# bf2944bd 05-Jun-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix initialization of next_port

next_port should be between sysctl_local_port_range[0] and [1].
However, it is initially set to a random value with get_random_bytes().
If the value is negative when treated as a signed integer, next_port
can end up outside the expected range because of the result of the %
operator being negative.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 6c719f5c 07-May-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add check to validate that cm_id is bound to a device

Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device. Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# be65f086 07-May-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Fix synchronization with device removal in cma_iw_handler

The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id. Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 8aa08602 07-May-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Simplify device removal handling code

Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# cb164b8c 05-Mar-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()

The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list. Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1836854f 22-Feb-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Remove unused node_guid from cma_device structure

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 962063e6 21-Feb-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Request reversible paths only

The rdma_cm requires that path records be reversible. Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c8f6a362 15-Feb-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add multicast communication support

Extend rdma_cm to support multicast communication. Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space. The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP. The port space determines the signature used in the
MGID when joining the group. The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant. This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP. The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c7f743a6 01-Feb-2007 Sean Hefty <sean.hefty@intel.com>

IB: Remove redundant "_wq" from workqueue names

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# aedec080 29-Jan-2007 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Increment port number after close to avoid re-use

Randomize the starting port number and avoid re-using port values
immediately after they are closed. Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 881a045f 15-Dec-2006 Steve Wise <larrystevenwise@gmail.com>

RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects

The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 628e5f6d 30-Nov-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Add support for RDMA_PS_UDP

Allow the use of UD QPs through the rdma_cm, in order to provide
address translation services for resolving IB addresses for datagram
messages using SIDR.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 0fe313b0 30-Nov-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Allow early transition to RTS to handle lost CM messages

During connection establishment, the passive side of a connection can
receive messages from the active side before the connection event has
been delivered to the user. Allow the passive side to send messages
in response to received data before the event is delivered. To handle
the case where the connection messages are lost, a new rdma_notify()
function is added that users may invoke to force a connection into the
established state.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# a1b1b61f 30-Nov-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Report connect info with connect events

Connection information was never given to the recipient of a
connection request or reply message. Only the event was delivered.
Report the connection data with the event to allows user to
reject the connection based on the requested parameters, or adjust
their resources to match the request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 9b2e9c0c 30-Nov-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Remove unneeded qp_type parameter from rdma_cm

The qp_type parameter into the rdma_cm is unneeded, and can be
misleading. The QP type should be determined from the port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# e31353ea 24-Oct-2006 Dotan Barak <dotanb@mellanox.co.il>

RDMA/cm: Remove setting local write as part of QP access flags

The qp_access_flags are for remote access permissions only, so
IB_ACCESS_LOCAL_WRITE is an invalid value. Remove it from the values
set by cm_init_qp_init_attr() and cma_init_ib_qp().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# a1a733f6 16-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Rewrite cma_req_handler() to encapsulate common code

Rewrite cma_req_handler error handling case to encapsulate
common code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# e4022274 15-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Remove redundant check in cma_add_one

Remove redundant check of node_guid in cma_add_one().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# e82153b5 15-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Optimize cma_bind_loopback() to check for empty list

Optimize to test for an empty list first. This ends up simplifying
the code too.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c4028958 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 7a118df3 31-Oct-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/addr: Use client registration to fix module unload race

Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 255d0c14 24-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count

rdma_bind_addr() leaks a cma_dev reference count in failure case.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>


# 3f168d2b 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Optimize error handling

Reorganize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 94de178a 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Eliminate unnecessary remove_list

Eliminate remove_list by using list_del_init() instead during device
removal handling.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 8f0472d3 29-Sep-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Set status correctly on route resolution error

On reporting a route error, also include the status for the error,
rather than indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 6e35aabe 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Fix device removal race

The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
which sets id state to CMA_DEVICE_REMOVAL and
calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
and calls cma_acquire_ib_dev() and on failure
calls cma_release_remove(), which does a
wake_up of cma_process_remove(). Then
cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
state of id, and since it is still (wrongly)
CMA_DEVICE_REMOVAL, it calls notify_user(id)
and if that fails, the caller - cma_process_remove()
calls rdma_destroy_id(id). Two processes can
call rdma_destroy_id(), resulting in one
de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 675a027c 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com>

RDMA/cma: Fix leak of cm_ids in case of failures

cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c1a0b23b 21-Aug-2006 Michael S. Tsirkin <mst@mellanox.co.il>

IB/sa: Require SA registration

Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running. Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 61a73c70 01-Sep-2006 Sean Hefty <sean.hefty@intel.com>

RDMA/cma: Protect against adding device during destruction

Closes a window where address resolution can attach an rdma_cm_id to a
device during destruction of the rdma_cm_id. This can result in the
rdma_cm_id remaining in the device list after its memory has been
freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 07ebafba 03-Aug-2006 Tom Tucker <tom@opengridcomputing.com>

RDMA: iWARP Core Changes.

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
- Hook iWARP CM into the build system and use it in rdma_cm.
- Convert enum ib_node_type to enum rdma_node_type, which includes
the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 3cd96564 22-Sep-2006 Roland Dreier <rolandd@cisco.com>

IB: Whitespace fixes

Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all. Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>


# d5bb7599 13-Sep-2006 Michael S. Tsirkin <mst@mellanox.co.il>

RDMA/cma: Increase the IB CM retry count in CMA

3 seems like a low number of IB Communication Manager retries to set;
we see connections failing under stress, and in any case 3 just looks
like an arbitrary number. 15 is the max value allowed by the
InfiniBand spec.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 74f76fba 14-Jul-2006 Ira Weiny <weiny2@llnl.gov>

[PATCH] IB/cm: set private data length for reject messages

Set private data length for reject messages to the correct size. Fix from
openib svn r8483.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f0ee3404 14-Jul-2006 Michael S. Tsirkin <mst@mellanox.co.il>

[PATCH] IB/addr: gid structure alignment fix

The device address contains unsigned character arrays, which contain raw GID
addresses. The GIDs may not be naturally aligned, so do not cast them to
structures or unions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5fd571cb 21-Jun-2006 Eric Sesterhenn <snakebyte@gmx.de>

[PATCH] Array overrun in drivers/infiniband/core/cma.c

This was spotted by coverity #id 1300. Since the array has only four
elements, we should just use those four.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e51060f0 17-Jun-2006 Sean Hefty <sean.hefty@intel.com>

IB: IP address based RDMA connection manager

Kernel connection management agent over InfiniBand that connects based
on IP addresses. The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>