History log of /linux-master/net/rds/ib_send.c
Revision Date Author Comments
# 4db6187d 27-Apr-2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

rds: Remove redundant assignment to nr_sig

Variable nr_sig is being assigned a value however the assignment is
never read, so this redundant assignment can be removed.

Cleans up the following clang-analyzer warning:

net/rds/ib_send.c:297:2: warning: Value stored to 'nr_sig' is never read
[clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42f2611c 06-Nov-2020 Christoph Hellwig <hch@lst.de>

rds: stop using dmapool

RDMA ULPs should only perform DMA through the ib_dma_* API instead of
using the hidden dma_device directly. In addition using the dma coherent
API family that dmapool is a part of can be very ineffcient on plaforms
that are not DMA coherent. Switch to use slab allocations and the
ib_dma_* APIs instead.

Link: https://lore.kernel.org/r/20201106181941.1878556-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 2eafa174 15-Jan-2020 Hans Westgaard Ry <hans.westgaard.ry@oracle.com>

net/rds: Handle ODP mr registration/unregistration

On-Demand-Paging MRs are registered using ib_reg_user_mr and
unregistered with ib_dereg_mr.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 9b17f588 02-Oct-2019 Ka-Cheong Poon <ka-cheong.poon@oracle.com>

net/rds: Use DMA memory pool allocation for rds_header

Currently, RDS calls ib_dma_alloc_coherent() to allocate a large piece
of contiguous DMA coherent memory to store struct rds_header for
sending/receiving packets. The memory allocated is then partitioned
into struct rds_header. This is not necessary and can be costly at
times when memory is fragmented. Instead, RDS should use the DMA
memory pool interface to handle this. The DMA addresses of the pre-
allocated headers are stored in an array. At send/receive ring
initialization and refill time, this arrary is de-referenced to get
the DMA addresses. This array is not accessed at send/receive packet
processing.

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fab401e1 01-Oct-2019 Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>

net/rds: Log vendor error if send/recv Work requests fail

Log vendor error if work requests fail. Vendor error provides
more information that is used for debugging the issue.

Signed-off-by: Sudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 616d37a0 18-Jun-2019 Santosh Shilimkar <santosh.shilimkar@oracle.com>

rds: fix reordering with composite message notification

RDS composite message(rdma + control) user notification needs to be
triggered once the full message is delivered and such a fix was
added as part of commit 941f8d55f6d61 ("RDS: RDMA: Fix the composite
message user notification"). But rds_send_remove_from_sock is missing
data part notify check and hence at times the user don't get
notification which isn't desirable.

One way is to fix the rds_send_remove_from_sock to check of that case
but considering the ordering complexity with completion handler and
rdma + control messages are always dispatched back to back in same send
context, just delaying the signaled completion on rmda work request also
gets the desired behaviour. i.e Notifying application only after
RDMA + control message send completes. So patch updates the earlier
fix with this approach. The delay signaling completions of rdma op
till the control message send completes fix was done by Venkat
Venkatsubra in downstream kernel.

Reviewed-and-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


# fd261ce6 13-Oct-2018 Santosh Shilimkar <santosh.shilimkar@oracle.com>

rds: rdma: update rdma transport for tos

For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.

The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>


# a163afc8 31-Jan-2019 Bart Van Assche <bvanassche@acm.org>

IB/core: Remove ib_sg_dma_address() and ib_sg_dma_len()

Keeping single line wrapper functions is not useful. Hence remove the
ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not
change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# eeb2c4fb 06-Jan-2019 Jacob Wen <jian.w.wen@oracle.com>

rds: use DIV_ROUND_UP instead of ceil

Yes indeed, DIV_ROUND_UP is in kernel.h.

Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87f70132 01-Aug-2018 YueHaibing <yuehaibing@huawei.com>

rds: remove redundant variable 'rds_ibdev'

Variable 'rds_ibdev' is being assigned but never used,
so can be removed.

fix this clang warning:
net/rds/ib_send.c:762:24: warning: variable ‘rds_ibdev’ set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d34ac5cd 18-Jul-2018 Bart Van Assche <bvanassche@acm.org>

RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# eee2fa6a 23-Jul-2018 Ka-Cheong Poon <ka-cheong.poon@oracle.com>

rds: Changing IP address internal representation to struct in6_addr

This patch changes the internal representation of an IP address to use
struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr. But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0c0865f 24-Oct-2017 Håkon Bugge <Haakon.Bugge@oracle.com>

rds: Fix inaccurate accounting of unsignaled wrs

The number of unsignaled work-requests posted to the IB send queue is
tracked by a counter in the rds_ib_connection struct. When it reaches
zero, or the caller explicitly asks for it, the send-signaled bit is
set in send_flags and the counter is reset. This is performed by the
rds_ib_set_wr_signal_state() function.

However, this function is not always used which yields inaccurate
accounting. This commit fixes this, re-factors a code bloat related to
the matter, and makes the actual parameter type to the function
consistent.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9a0b998 24-Oct-2017 Håkon Bugge <Haakon.Bugge@oracle.com>

rds: ib: Fix uninitialized variable

send_flags needs to be initialized before calling
rds_ib_set_wr_signal_state().

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d2c58294 17-Feb-2017 Zhu Yanjun <yanjun.zhu@oracle.com>

rds:Remove unnecessary ib_ring unalloc

In the function rds_ib_xmit_atomic, ib_ring is not allocated
successfully. As such, it is not necessary to unalloc it.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 941f8d55 18-Feb-2016 Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: RDMA: Fix the composite message user notification

When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.

Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


# ff3f19a2 14-Mar-2016 Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: include faddr in connection log

Also use pr_* for it.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


# 226f7a7d 30-Jun-2016 Sowmini Varadhan <sowmini.varadhan@oracle.com>

RDS: Rework path specific indirections

Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0cb43965 13-Jun-2016 Sowmini Varadhan <sowmini.varadhan@oracle.com>

RDS: split out connection specific state from rds_connection to rds_conn_path

In preparation for multipath RDS, split the rds_connection
structure into a base structure, and a per-path struct rds_conn_path.
The base structure tracks information and locks common to all
paths. The workqs for send/recv/shutdown etc are tracked per
rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

This commit allows for one rds_conn_path per rds_connection, and will
be extended into multiple conn_paths in subsequent commits.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dcfd041c 01-Mar-2016 santosh.shilimkar@oracle.com <santosh.shilimkar@oracle.com>

RDS: IB: Remove the RDS_IB_SEND_OP dependency

This helps to combine asynchronous fastreg MR completion handler
with send completion handler.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e622f2f4 08-Oct-2015 Christoph Hellwig <hch@lst.de>

IB: split struct ib_send_wr

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr. This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old): 96

sizeof(struct ib_send_wr): 48
sizeof(struct ib_rdma_wr): 64
sizeof(struct ib_atomic_wr): 96
sizeof(struct ib_ud_wr): 88
sizeof(struct ib_fast_reg_wr): 88
sizeof(struct ib_bind_mw_wr): 96
sizeof(struct ib_sig_handover_wr): 80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr): 64

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>


# 0c28c045 06-Sep-2015 Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: split send completion handling and do batch ack

Similar to what we did with receive CQ completion handling, we split
the transmit completion handler so that it lets us implement batched
work completion handling.

We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
identify the send vs receive completion event handler invocation.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


# e5580242 30-Jul-2015 Jason Gunthorpe <jgg@ziepe.ca>

rds/ib: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3049147c 22-Aug-2015 santosh.shilimkar@oracle.com <santosh.shilimkar@oracle.com>

RDS: Make sure we do a signaled send for large-send

WR(Work Requests )always generate a WC(Work Completion) with
signaled send. Default RDS ib code is setup for un-signaled
completion. Since RDS connction is persistent, we can end up
sending the data even after large-send when the remote end is
not active(for any reason).

By doing a signaled send at least once per large-send,
we can at least detect the problem in work completion
handler there by avoiding sending more data to
inactive remote.

Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d655a9fb 20-May-2015 Wengang Wang <wen.gang.wang@oracle.com>

rds: re-entry of rds_ib_xmit/rds_iw_xmit

The BUG_ON at line 452/453 is triggered in function rds_send_xmit.

441 while (ret) {
442 tmp = min_t(int, ret, sg->length -
443 conn->c_xmit_data_off);
444 conn->c_xmit_data_off += tmp;
445 ret -= tmp;
446 if (conn->c_xmit_data_off == sg->length) {
447 conn->c_xmit_data_off = 0;
448 sg++;
449 conn->c_xmit_sg++;
450 if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
451 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
452 BUG_ON(ret != 0 &&
453 conn->c_xmit_sg == rm->data.op_nents);
454 }
455 }

it is complaining the total sent length is bigger that we want to send.

rds_ib_xmit() is wrong for the second entry for the same rds_message returning
wrong value.

the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
scatterlist.offset/length, but the rds_ib_xmit action is based on
scatterlist.dma_address/dma_length. in case dma_length is larger than length
there is problem. for the 2nd and later entries of rds_ib_xmit for same
rds_message, at least one of the following two is wrong:

1) the scatterlist to start with, the choosen one can far beyond the correct
one.
2) the offset to start with within the scatterlist.

fix:
add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
and op_dmaoff are initialized to zero when doing dma mapping for the first see
of the message and are changed when filling send slots.

the same applies to rds_iw_xmit too.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3c88f3dc 18-May-2015 Sagi Grimberg <sagig@mellanox.com>

RDS: Switch to generic logging helpers

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 11ac1199 05-Feb-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

net: rds: Remove repeated function names from debug output

The macro rdsdebug is defined as

pr_debug("%s(): " fmt, __func__ , ##args)

Hence it doesn't make sense to include the name of the calling
function explicitly in the format string passed to rdsdebug.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 71fd762f 18-May-2014 Manuel Schölling <manuel.schoelling@gmx.de>

net: rds: Use time_after() for time comparison

To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 18fc25c9 02-Dec-2013 Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

rds: prevent BUG_ON triggered on congestion update to loopback

After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.

For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
&& rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
= 8192
c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
while (ret) {
tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
[tmp = 65536 - 57632 = 7904]
conn->c_xmit_data_off += tmp;
[c_xmit_data_off = 57632 + 7904 = 65536]
ret -= tmp;
[ret = 8240 - 7904 = 336]
if (conn->c_xmit_data_off == sg->length) {
conn->c_xmit_data_off = 0;
sg++;
conn->c_xmit_sg++;
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
[c_xmit_sg = 1, rm->data.op_nents = 1]

What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.

Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb0a6056 15-Jun-2011 Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>

net/rds: use prink_ratelimited() instead of printk_ratelimit()

Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited()

Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 6094628b 01-Mar-2011 Neil Horman <nhorman@tuxdriver.com>

rds: prevent BUG_ON triggering on congestion map updates

Recently had this bug halt reported to me:

kernel BUG at net/rds/send.c:329!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
dm_mod [last unloaded: scsi_wait_scan]
NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 44000022 XER: 00000000
TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
Call Trace:
[c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
(unreliable)
[c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
[c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
[c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
[c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
Instruction dump:
4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000
Kernel panic - not syncing: Fatal exception
Call Trace:
[c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
[c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
[c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
[c000000175cab750] [c000000000030000] ._exception+0x110/0x220
[c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

Signed-off-by: David S. Miller <davem@davemloft.net>


# 20c72bd5 25-Aug-2010 Andy Grover <andy.grover@oracle.com>

RDS: Implement masked atomic operations

Add two CMSGs for masked versions of cswp and fadd. args
struct modified to use a union for different atomic op type's
arguments. Change IB to do masked atomic ops. Atomic op type
in rds_message similarly unionized.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 59f740a6 03-Aug-2010 Zach Brown <zach.brown@oracle.com>

RDS/IB: print string constants in more places

This prints the constant identifier for work completion status and rdma
cm event types, like we already do for IB event types.

A core string array helper is added that each string type uses.

Signed-off-by: Zach Brown <zach.brown@oracle.com>


# f046011c 14-Jul-2010 Zach Brown <zach.brown@oracle.com>

RDS/IB: track signaled sends

We're seeing bugs today where IB connection shutdown clears the send
ring while the tasklet is processing completed sends. Implementation
details cause this to dereference a null pointer. Shutdown needs to
wait for send completion to stop before tearing down the connection. We
can't simply wait for the ring to empty because it may contain
unsignaled sends that will never be processed.

This patch tracks the number of signaled sends that we've posted and
waits for them to complete. It also makes sure that the tasklet has
finished executing.

Signed-off-by: Zach Brown <zach.brown@oracle.com>


# 0f4b1c7e 04-Jun-2010 Zach Brown <zach.brown@oracle.com>

rds: fix rds_send_xmit() serialization

rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
mutex so that it could be called from the IB receive tasklet path. This broke
the TCP transport because its xmit method can block and masks and unmasks
interrupts.

This patch serializes callers to rds_send_xmit() with a simple bit instead of
the current spinlock or previous mutex. This enables rds_send_xmit() to be
called from any context and to call functions which block. Getting rid of the
c_send_lock exposes the bare c_lock acquisitions which are changed to block
interrupts.

A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
rds_send_xmit() before tearing down partial send state. This lets us get rid
of c_senders.

rds_send_xmit() is changed to check the conn state after acquiring the
RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both
worked with the conn state and then the lock in the same order, allowing them
to race and execute the paths concurrently.

rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
properly ensures that rds_send_xmit() can't start once the conn state has been
changed. We can remove its previous use of the spinlock.

Finally, c_send_generation is redundant. Callers can race to test the c_flags
bit by simply retrying instead of racing to test the c_send_generation atomic.

Signed-off-by: Zach Brown <zach.brown@oracle.com>


# 89bf9d41 18-May-2010 Zach Brown <zach.brown@oracle.com>

RDS/IB: get the xmit max_sge from the RDS IB device on the connection

rds_ib_xmit_rdma() was calling ib_get_client_data() to get at the rds_ibdevice
just to get the max_sge for the transmit. This patch instead has it get it
directly off the rds_ibdev which is stored on the connection.

The current code won't free the rds_ibdev until all the IB connections that use
it are freed. So it's safe to reference the rds_ibdev this way. In the future
it also makes it easier to support proper reference counting of the rds_ibdev
struct.

As an additional bonus, this gets rid of the performance hit of calling in to
the IB stack to look up the rds_ibdev. The current implementation in the IB
stack acquires an interrupt blocking spinlock to protect the registration of
client callback data.

Signed-off-by: Zach Brown <zach.brown@oracle.com>


# 1cc2228c 11-May-2010 Chris Mason <chris.mason@oracle.com>

rds: Fix reference counting on the for xmit_atomic and xmit_rdma

This makes sure we have the proper number of references in
rds_ib_xmit_atomic and rds_ib_xmit_rdma. We also consistently
drop references the same way for all message types as the IOs end.

Signed-off-by: Chris Mason <chris.mason@oracle.com>


# c9e65383 11-May-2010 Chris Mason <chris.mason@oracle.com>

rds: Fix RDMA message reference counting

The RDS send_xmit code was trying to get fancy with message
counting and was dropping the final reference on the RDMA messages
too early. This resulted in memory corruption and oopsen.

The fix here is to always add a ref as the parts of the message passes
through rds_send_xmit, and always drop a ref as the parts of the message
go through completion handling.

Signed-off-by: Chris Mason <chris.mason@oracle.com>


# 51e2cba8 29-Mar-2010 Andy Grover <andy.grover@oracle.com>

RDS: Move atomic stats from general to ib-specific area

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# ff3d7d36 01-Mar-2010 Andy Grover <andy.grover@oracle.com>

RDS: Perform unmapping ops in stages

Previously, RDS would wait until the final send WR had completed
and then handle cleanup. With silent ops, we do not know
if an atomic, rdma, or data op will be last. This patch
handles any of these cases by keeping a pointer to the last
op in the message in m_last_op.

When the TX completion event fires, rds dispatches to per-op-type
cleanup functions, and then does whole-message cleanup, if the
last op equalled m_last_op.

This patch also moves towards having op-specific functions take
the op struct, instead of the overall rm struct.

rds_ib_connection has a pointer to keep track of a a partially-
completed data send operation. This patch changes it from an
rds_message pointer to the narrower rm_data_op pointer, and
modifies places that use this pointer as needed.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 6c7cc6e4 27-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: Rename data op members prefix from m_ to op_

For consistency.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# f8b3aaf2 01-Mar-2010 Andy Grover <andy.grover@oracle.com>

RDS: Remove struct rds_rdma_op

A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 241eef3e 19-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: Implement silent atomics

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# c8de3f10 15-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS/IB: Make all flow control code conditional on i_flowctl

Maybe things worked fine with the flow control code running
even in the non-flow-control case, but making it explicitly
conditional helps the non-fc case be easier to read.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 1d34f175 14-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: Remove unsignaled_bytes sysctl

Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# da5a06ce 14-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: rewrite rds_ib_xmit

Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().

Rename sent to bytes_sent, to differentiate better from other
variable named "send".

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 919ced4c 13-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS/IB: Remove ib_[header/data]_sge() functions

These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 6f3d05db 13-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS/IB: Remove dead code

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 9c030391 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS/IB: eliminate duplicate code

both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 15133f6e 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: Implement atomic operations

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# ff87e97a 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: make m_rdma_op a member of rds_message

This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 21f79afa 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: fold rdma.h into rds.h

RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# e779137a 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: break out rdma and data ops into nested structs in rds_message

Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 8690bfa1 12-Jan-2010 Andy Grover <andy.grover@oracle.com>

RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons

Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>


# 450d06c0 11-Mar-2010 Sherman Pun <sherman.pun@sun.com>

RDS: Properly unmap when getting a remote access error

If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e7b3b99 11-Mar-2010 Andy Grover <andy.grover@oracle.com>

RDS: Fix congestion issues for loopback

We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.

For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.

The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.

Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 735f61e6 11-Mar-2010 Andy Grover <andy.grover@oracle.com>

RDS: Do not BUG() on error returned from ib_post_send

BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f64f9e71 29-Nov-2009 Joe Perches <joe@perches.com>

net: Move && and || to end of previous line

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7b70d033 09-Apr-2009 Steve Wise <larrystevenwise@gmail.com>

RDS/IW+IB: Allow max credit advertise window.

Fix hack that restricts the credit advertisement to 127.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d39e0602 09-Apr-2009 Steve Wise <larrystevenwise@gmail.com>

RDS/IW+IB: Set the RDS_LL_SEND_FULL bit when we're throttled.

The RDS_LL_SEND_FULL bit should be set when we stop transmitted due to
flow control. Otherwise the send worker will keep trying as opposed to
sleeping until we unthrottle. Saves CPU.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a0979df 24-Feb-2009 Andy Grover <andy.grover@oracle.com>

RDS/IB: Implement IB-specific datagram send.

Specific to IB is a credits-based flow control mechanism, in
addition to the expected usage of the IB API to package outgoing
data into work requests.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>