History log of /linux-master/drivers/infiniband/hw/cxgb4/cq.c
Revision Date Author Comments
# cab30a98 15-Dec-2022 Alexey Kodanev <aleksei.kodanev@bell-sw.com>

RDMA/cxgb4: remove unnecessary NULL check in __c4iw_poll_cq_one()

If 'qhp' is NULL then 'wq' is also NULL:

struct t4_wq *wq = qhp ? &qhp->wq : NULL;
...
ret = poll_cq(wq, ...);
if (ret)
goto out;

poll_cq(wq, ...) always returns a non-zero status if 'wq' is NULL,
either on a t4_next_cqe() error or on a 'wq == NULL' check.

Therefore, checking 'qhp' again after poll_cq() is redundant.

BTW, there're also 'qhp' dereference cases below poll_cq() without
any checks (c4iw_invalidate_mr(qhp->rhp,...)).

Detected using the static analysis tool - Svace.

Fixes: 4ab39e2f98f2 ("RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Link: https://lore.kernel.org/r/20221215123030.155378-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 2638a323 05-Aug-2021 Dakshaja Uppalapati <dakshaja@chelsio.com>

RDMA/iw_cxgb4: Fix refcount underflow while destroying cqs.

Previous atomic increment/decrement logic expects the atomic count to be
'0' after the final decrement.

Replacing atomic count with refcount does not allow that, as
refcount_dec() considers count of 1 as underflow and triggers a kernel
splat.

Fix the current refcount logic by using the usual pattern of decrementing
the refcount and test if it is '0' on the final deref in
c4iw_destroy_cq(). Use wait_for_completion() instead of wait_event().

Fixes: 7183451f846d ("RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting")
Link: https://lore.kernel.org/r/1628167412-12114-1-git-send-email-dakshaja@chelsio.com
Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 7183451f 28-May-2021 Weihang Li <liweihang@huawei.com>

RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting

The refcount_t API will WARN on underflow and overflow of a reference
counter, and avoid use-after-free risks.

Link: https://lore.kernel.org/r/1622194663-2383-12-git-send-email-liweihang@huawei.com
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 6d8285e6 08-Nov-2020 Kamal Heib <kamalheib1@gmail.com>

RDMA/cxgb4: Validate the number of CQEs

Before create CQ, make sure that the requested number of CQEs is in the
supported range.

Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Link: https://lore.kernel.org/r/20201108132007.67537-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 1c407cb5 03-Oct-2020 Jason Gunthorpe <jgg@ziepe.ca>

RDMA: Check flags during create_cq

Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc848 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 43d781b9 07-Sep-2020 Leon Romanovsky <leon@kernel.org>

RDMA: Allow fail of destroy CQ

Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# 4c44d463 27-Jun-2019 Fuqian Huang <huangfq.daxian@gmail.com>

IB: Remove unneeded memset

In commit af7ddd8a627c ("Merge tag 'dma-mapping-4.21' of
git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent/dmam_alloc_coherent always zeroed the returned memory.
So the memset after a coherent allocation function is not needed.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# e39afe3d 28-May-2019 Leon Romanovsky <leon@kernel.org>

RDMA: Convert CQ allocations to be under core responsibility

Ensure that CQ is allocated and freed by IB/core and not by drivers.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a52c8e24 28-May-2019 Leon Romanovsky <leon@kernel.org>

RDMA: Clean destroy CQ in drivers do not return errors

Like all other destroy commands, .destroy_cq() call is not supposed
to fail. In all flows, the attempt to return earlier caused to memory
leaks.

This patch converts .destroy_cq() to do not return any errors.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cae626b9 20-May-2019 Leon Romanovsky <leon@kernel.org>

RDMA/cxgb4: Don't expose DMA addresses

Change unconditional print of DMA address to be printed with special
printk format type specifier.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 34d56893 20-May-2019 Leon Romanovsky <leon@kernel.org>

RDMA/cxgb4: Use sizeof() notation

Convert various sizeof call sites to be written in standard format
sizeof().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ff23dfa1 31-Mar-2019 Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

IB: Pass only ib_udata in function prototypes

Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.

Make ib_udata the only argument psssed.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# bdeacabd 31-Mar-2019 Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

IB: Remove 'uobject->context' dependency in object destroy APIs

Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c4367a26 31-Mar-2019 Shamir Rabinovitch <shamir.rabinovitch@oracle.com>

IB: Pass uverbs_attr_bundle down ib_x destroy path

The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 52e124c2 20-Feb-2019 Matthew Wilcox <willy@infradead.org>

cxgb4: Convert cqidr to XArray

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 1b571086 24-Sep-2018 Nathan Chancellor <nathan@kernel.org>

iw_cxgb4: Use proper enumerated type in c4iw_bar2_addrs

Clang warns when one enumerated type is implicitly converted to another.

drivers/infiniband/hw/cxgb4/qp.c:287:8: warning: implicit conversion
from enumeration type 'enum t4_bar2_qtype' to different enumeration type
'enum cxgb4_bar2_qtype' [-Wenum-conversion]
T4_BAR2_QTYPE_EGRESS,
^~~~~~~~~~~~~~~~~~~~

c4iw_bar2_addrs expects a value from enum cxgb4_bar2_qtype so use the
corresponding values from that type so Clang is satisfied without changing
the meaning of the code.

T4_BAR2_QTYPE_EGRESS = CXGB4_BAR2_QTYPE_EGRESS = 0
T4_BAR2_QTYPE_INGRESS = CXGB4_BAR2_QTYPE_INGRESS = 1

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# b9855f4c 02-Aug-2018 Potnuri Bharat Teja <bharat@chelsio.com>

iw_cxgb4: RDMA write with immediate support

Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 1ffba626 27-Jul-2018 Kamal Heib <kamalheib1@gmail.com>

RDMA/providers: Remove pointless functions

The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 6a0b6174 25-Jul-2018 Raju Rangoju <rajur@chelsio.com>

rdma/cxgb4: Add support for kernel mode SRQ's

This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.

Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.

This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 65ca8d96 05-Jul-2018 Raju Rangoju <rajur@chelsio.com>

rdma/cxgb4: Add support for 64Byte cqes

This patch adds support for iw_cxb4 to extend cqes from existing 32Byte
size to 64Byte.

Also includes adds backward compatibility support (for 32Byte) to work
with older libraries.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 23ff6ba8 10-Jul-2018 Jason Gunthorpe <jgg@ziepe.ca>

RDMA/cxgb4: Restore the dropped uninitialized_var

In some configurations even gcc 7 cannot unravel this complexity and still
throws a warning.

Fixes: 4ab39e2f98f2 ("RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 4ab39e2f 06-Jul-2018 Bart Van Assche <bvanassche@acm.org>

RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze

Introduce the function __c4iw_poll_cq_one() such that c4iw_poll_cq_one()
becomes easier to analyze for static source code analyzers. This patch
avoids that sparse reports the following:

drivers/infiniband/hw/cxgb4/cq.c:401:36: warning: context imbalance in 'c4iw_flush_hw_cq' - unexpected unlock
drivers/infiniband/hw/cxgb4/cq.c:824:9: warning: context imbalance in 'c4iw_poll_cq_one' - different lock contexts for basic block

Compile-tested only.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 2df19e19 27-Apr-2018 Bharat Potnuri <bharat@chelsio.com>

iw_cxgb4: Atomically flush per QP HW CQEs

When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire
corresponding QP lock before moving the CQEs into its corresponding SW
queue and accessing the SQ contents for completing a WR.
Ignore CQEs if corresponding QP is already flushed.

Cc: stable@vger.kernel.org
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 96a236ed 19-Dec-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: reflect the original WR opcode in drain cqes

The flush/drain logic was not retaining the original wr opcode in
its completion. This can cause problems if the application uses
the completion opcode to make decisions.

Use bit 10 of the CQE header word to indicate the CQE is a special
drain completion, and save the original WR opcode in the cqe header
opcode field.

Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")

Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# f55688c4 18-Dec-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: Only validate the MSN for successful completions

If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).

Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c058ecf6 27-Nov-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: only insert drain cqes if wq is flushed

Only insert our special drain CQEs to support ib_drain_sq/rq() after
the wq is flushed. Otherwise, existing but not yet polled CQEs can be
returned out of order to the user application. This can happen when the
QP has exited RTS but not yet flushed the QP, which can happen during
a normal close (vs abortive close).

In addition never count the drain CQEs when determining how many CQEs
need to be synthesized during the flush operation. This latter issue
should never happen if the QP is properly flushed before inserting the
drain CQE, but I wanted to avoid corrupting the CQ state. So we handle
it and log a warning once.

Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ba97b749 02-Nov-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: remove BUG_ON() usage.

iw_cxgb4 has many BUG_ON()s that were left over from various enhancemnets
made over the years. Almost all of them should just be removed. Some,
however indicate a ULP usage error and can be handled w/o bringing down
the system.

If the condition cannot happen with correctly implemented cxgb4 sw/fw,
then remove the BUG_ON.

If the condition indicates a misbehaving ULP (like CQ overflows), add
proper recovery logic.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 2015f26c 26-Sep-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: add referencing to wait objects

For messages sent from the host to fw that solicit a reply from fw,
the c4iw_wr_wait struct pointer is passed in the host->fw message, and
included in the fw->host fw6_msg reply. This allows the sender to wait
until the reply is received, and the code processing the ingress reply
to wake up the sender.

If c4iw_wait_for_reply() times out, however, we need to keep the
c4iw_wr_wait object around in case the reply eventually does arrive.
Otherwise we have touch-after-free bugs in the wake_up paths.

This was hit due to a bad kernel driver that blocked ingress processing
of cxgb4 for a long time, causing iw_cxgb4 timeouts, but eventually
resuming ingress processing and thus hitting the touch-after-free bug.

So I want to fix iw_cxgb4 such that we'll at least keep the wait object
around until the reply comes. If it never comes we leak a small amount of
memory, but if it does come late, we won't potentially crash the system.

So add a kref struct in the c4iw_wr_wait struct, and take a reference
before sending a message to FW that will generate a FW6 reply. And remove
the reference (and potentially free the wait object) when the reply
is processed.

The ep code also uses the wr_wait for non FW6 CPL messages and doesn't
embed the c4iw_wr_wait object in the message sent to firmware. So for
those cases we add c4iw_wake_up_noref().

The mr/mw, cq, and qp object create/destroy paths do need this reference
logic. For these paths, c4iw_ref_send_wait() is introduced to take the
wr_wait reference, send the msg to fw, and then wait for the reply.

So going forward, iw_cxgb4 either uses c4iw_ofld_send(),
c4iw_wait_for_reply() and c4iw_wake_up_noref() like is done in the some
of the endpoint logic, or c4iw_ref_send_wait() and c4iw_wake_up_deref()
(formerly c4iw_wake_up()) when sending messages with the c4iw_wr_wait
object pointer embedded in the message and resulting FW6 reply.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 13ce8317 26-Sep-2017 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: allocate wait object for each cq object

Remove the local stack allocated c4iw_wr_wait object in preparation for
correctly handling timeouts.

Also cleaned up some error path unwind logic to make it more readable.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 548ddb19 27-Sep-2017 Bharat Potnuri <bharat@chelsio.com>

iw_cxgb4: Remove __func__ parameter from pr_debug()

pr_debug() can be enabled to print function names, So removing the
unwanted __func__ parameters from debug logs.
Realign function parameters.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6ebedacb 13-Jul-2017 Dan Carpenter <dan.carpenter@oracle.com>

cxgb4: Fix error codes in c4iw_create_cq()

If one of these kmalloc() calls fails then we return ERR_PTR(0) which is
NULL. It results in a NULL dereference in the callers.

Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# de77b966 18-Jun-2017 yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

net: introduce __skb_put_[zero, data, u8]

follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4df864c1 16-Jun-2017 Johannes Berg <johannes.berg@intel.com>

networking: make skb_put & friends return void pointers

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a9a42886 09-Feb-2017 Joe Perches <joe@perches.com>

cxgb4: Convert PDBG to pr_debug

Use a more typical logging style.

Miscellanea:

o Obsolete the c4iw_debug module parameter
o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 700456bd 09-Feb-2017 Joe Perches <joe@perches.com>

cxgb4: Use more common logging style

Convert printks to pr_<level>

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 4fe7c296 22-Dec-2016 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: refactor sq/rq drain logic

With the addition of the IB/Core drain API, iw_cxgb4 supported drain
by watching the CQs when the QP was out of RTS and signalling "drain
complete" when the last CQE is polled. This, however, doesn't fully
support the drain semantics. Namely, the drain logic is supposed to signal
"drain complete" only when the application has _processed_ the last CQE,
not just removed them from the CQ. Thus a small timing hole exists that
can cause touch after free type bugs in applications using the drain API
(nvmf, iSER, for example). So iw_cxgb4 needs a better solution.

The iWARP Verbs spec mandates that "_at some point_ after the QP is
moved to ERROR", the iWARP driver MUST synchronously fail post_send and
post_recv calls. iw_cxgb4 was currently not allowing any posts once the
QP is in ERROR. This was in part due to the fact that the HW queues for
the QP in ERROR state are disabled at this point, so there wasn't much
else to do but fail the post operation synchronously. This restriction
is what drove the first drain implementation in iw_cxgb4 that has the
above mentioned flaw.

This patch changes iw_cxgb4 to allow post_send and post_recv WRs after
the QP is moved to ERROR state for kernel mode users, thus still adhering
to the Verbs spec for user mode users, but allowing flush WRs for kernel
users. Since the HW queues are disabled, we just synthesize a CQE for
this post, queue it to the SW CQ, and then call the CQ event handler.
This enables proper drain operations for the various storage applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5c6b2aaf 03-Nov-2016 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: invalidate the mr when posting a read_w_inv wr

Also, rearrange things a bit to have a common c4iw_invalidate_mr()
function used everywhere that we need to invalidate.

Fixes: 49b53a93a64a ("iw_cxgb4: add fast-path for small REG_MR operations")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 49b53a93 16-Sep-2016 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: add fast-path for small REG_MR operations

When processing a REG_MR work request, if fw supports the
FW_RI_NSMR_TPTE_WR work request, and if the page list for this
registration is <= 2 pages, and the current state of the mr is INVALID,
then use FW_RI_NSMR_TPTE_WR to pass down a fully populated TPTE for FW
to write. This avoids FW having to do an async read of the TPTE blocking
the SQ until the read completes.

To know if the current MR state is INVALID or not, iw_cxgb4 must track the
state of each fastreg MR. The c4iw_mr struct state is updated as REG_MR
and LOCAL_INV WRs are posted and completed, when a reg_mr is destroyed,
and when RECV completions are processed that include a local invalidation.

This optimization increases small IO IOPS for both iSER and NVMF.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cff069b7 23-Aug-2016 Bharat Potnuri <bharat@chelsio.com>

iw_cxgb4: Fix cxgb4 arm CQ logic w/IB_CQ_REPORT_MISSED_EVENTS

Current cxgb4 arm CQ logic ignores IB_CQ_REPORT_MISSED_EVENTS for
request completion notification on a CQ. Due to this ib_poll_handler()
assumes all events polled and avoids further iopoll scheduling.

This patch adds logic to cxgb4 ib_req_notify_cq() handler to check if
CQ is not empty and return accordingly. Based on the return value of
ib_req_notify_cq() handler, ib_poll_handler() will schedule a run of
iopoll handler.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# dd6b0241 09-Jun-2016 Hariprasad S <hariprasad@chelsio.com>

RDMA/iw_cxgb4: Low resource fixes for Completion queue

Pre-allocate buffers to deallocate completion queue, so that completion
queue is deallocated during RDMA termination when system is running
out of memory.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 32cc92c7 04-Apr-2016 Hariprasad S <hariprasad@chelsio.com>

RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips

For T4, kernel mode qps don't use the user doorbell. User mode qps during
flow control db ringing are forced into kernel, where user doorbell is
treated as kernel doorbell and proper bar2 offset in bar2 virtual space is
calculated, which incase of T4 is a bogus address, causing a kernel panic
due to illegal write during doorbell ringing.
In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4
check during bar2 virtual address calculation to return 0. Fixed Bar2
range checks based on bar2 physical address.

The below oops will be fixed

<1>BUG: unable to handle kernel paging request at 000000000002aa08
<1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
<4>PGD 1416a8067 PUD 15bf35067 PMD 0
<4>Oops: 0002 [#1] SMP
<4>last sysfs file:
/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid
<4>CPU 5
<4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4
target_core_iblock target_core_file target_core_pscsi target_core_mod
configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q
garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap
macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev
serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca
i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi
ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
Supermicro X8ST3/X8ST3
<4>RIP: 0010:[<ffffffffa011d800>] [<ffffffffa011d800>]
c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
<4>RSP: 0000:ffff880155a03db0 EFLAGS: 00010006
<4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180
<4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8
<4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001
<4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88
<4>FS: 0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0)
<4>Stack:
<4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b
<4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000
<4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88
<4>Call Trace:
<4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4]
<4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4]
<4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4]
<4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0
<4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0
<4> [<ffffffff810a660e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff810a6570>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85
71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8
66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01
<1>RIP [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
<4> RSP <ffff880155a03db0>
<4>CR2: 000000000002aa08`

Based on original work by Bharat Potnuri <bharat@chelsio.com>

Fixes: 74217d4c6a4fb0d8 ("iw_cxgb4: support for bar2 qid densities exceeding the page size")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 086dc6e3 17-Feb-2016 Steve Wise <larrystevenwise@gmail.com>

iw_cxgb4: add queue drain functions

Add completion objects, named sq_drained and rq_drained, to the c4iw_qp
struct. The queue-specific completion object is signaled when the last
CQE is drained from the CQ for that queue.

Add c4iw_drain_sq() to block until qp->rq_drained is completed.

Add c4iw_drain_rq() to block until qp->sq_drained is completed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# feb7c1e3 23-Dec-2015 Christoph Hellwig <hch@lst.de>

IB: remove in-kernel support for memory windows

Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d3cfd002 13-Oct-2015 Sagi Grimberg <sagig@mellanox.com>

iw_cxgb4: Remove old FRWR API

No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3661df17 27-Jul-2015 Hariprasad S <hariprasad@chelsio.com>

iw_cxgb4: gracefully handle unknown CQE status errors

c4iw_poll_cq_on() shouldn't fail the poll operation just because
the CQE status is unknown. Rather, it should map this to the
"fatal error" status and log the anomaly.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# bcf4c1ea 11-Jun-2015 Matan Barak <matanb@mellanox.com>

IB/core: Change provider's API of create_cq to be extendible

Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.

This commit does not change any functionality.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 74217d4c 09-Jun-2015 Hariprasad S <hariprasad@chelsio.com>

iw_cxgb4: support for bar2 qid densities exceeding the page size

Handle this configuration:

Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size

Use cxgb4_bar2_sge_qregs() to obtain the proper location within the
bar2 region for a given qid.

Rework the DB and GTS write functions to make use of this bar2 info.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 09ece8b9 21-Apr-2015 Hariprasad S <hariprasad@chelsio.com>

iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs

For T5, we must not use the kdb/kgts registers, in order avoid db drops
under extreme loads.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6198dd8d 21-Apr-2015 Hariprasad S <hariprasad@chelsio.com>

iw_cxgb4: 32b platform fixes

- get_dma_mr() was using ~0UL which is should be ~0ULL. This causes the
DMA MR to get setup incorrectly in hardware.

- wr_log_show() needed a 64b divide function div64_u64() instead of
doing
division directly.

- fixed warnings about recasting a pointer to a u64

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cf7fe64a 15-Jan-2015 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Cleanup register defines/MACROS defined in t4fw_ri_api.h

Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a56c66e8 15-Jan-2015 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Cleanup register defines/MACROS defined in t4.h

Cleanup all the MACROS defined in t4.h and the affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7fddaf0 15-Jan-2015 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Cleanup register defines/MACROS defined in t4fw_ri_api.h

Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6969de73 15-Jan-2015 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Cleanup register defines/MACROS defined in t4.h

Cleanup all the MACROS defined in t4.h and the affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e2ac9628 06-Nov-2014 Hariprasad Shenai <hariprasad@chelsio.com>

cxgb4: Cleanup macros so they follow the same style and look consistent, part 2

Various patches have ended up changing the style of the symbolic macros/register
defines to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66eb19af 21-Jul-2014 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: advertise the correct device max attributes

Advertise the actual max limits for things like qp depths, number of
qps, cqs, etc.

Clean up the queue allocation for qps and cqs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7730b4c7 14-Jul-2014 Hariprasad Shenai <hariprasad@chelsio.com>

cxgb4/iw_cxgb4: work request logging feature

This commit enhances the iwarp driver to optionally keep a log of rdma
work request timining data for kernel mode QPs. If iw_cxgb4 module option
c4iw_wr_log is set to non-zero, each work request is tracked and timing
data maintained in a rolling log that is 4096 entries deep by default.
Module option c4iw_wr_log_size_order allows specifing a log2 size to use
instead of the default order of 12 (4096 entries). Both module options
are read-only and must be passed in at module load time to set them. IE:

modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10

The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
Writing anything to this file will clear all the timing data.
Data tracked includes:

- The host time when the work request was posted, just before ringing
the doorbell. The host time when the completion was polled by the
application. This is also the time the log entry is created. The delta
of these two times is the amount of time took processing the work request.

- The qid of the EQ used to post the work request.

- The work request opcode.

- The cqe wr_id field. For sq completions requests this is the swsqe
index. For recv completions this is the MSN of the ingress SEND.
This value can be used to match log entries from this log with firmware
flowc event entries.

- The sge timestamp value just before ringing the doorbell when
posting, the sge timestamp value just after polling the completion,
and CQE.timestamp field from the completion itself. With these three
timestamps we can track the latency from post to poll, and the amount
of time the completion resided in the CQ before being reaped by the
application. With debug firmware, the sge timestamp is also logged by
firmware in its flowc history so that we can compute the latency from
posting the work request until the firmware sees it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04e10e21 14-Jul-2014 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Detect Ing. Padding Boundary at run-time

Updates iw_cxgb4 to determine the Ingress Padding Boundary from
cxgb4_lld_info, and take subsequent actions.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cf38be6d 06-Jun-2014 Hariprasad Shenai <hariprasad@chelsio.com>

iw_cxgb4: Allocate and use IQs specifically for indirect interrupts

Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
RXQs, which also handle direct interrupts for offload CPLs during RDMA
connection setup/teardown. The intended T4 usage model, however, is to
have indirect interrupts flow through dedicated IQs. IE not to mix
indirect interrupts with CPL messages in an IQ. This patch adds the
concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
CIQs.

Design:

cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs
are sized according to the max available CQs available at adapter init.
In addition, these IQs don't need FL buffers since they only service
indirect interrupts. One CIQ is setup per RX channel similar to the
RDMA RXQs.

iw_cxgb4 will utilize these CIQs based on the vector value passed into
create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the
number of CIQs configured, and thus the vector value will be the index
into the array of CIQs.

Based on original work by Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6f04d3d 05-May-2014 Yann Droneaud <ydroneaud@opteya.com>

RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp

The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.

So for most ABI struct c4iw_create_cq_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding
is not added.

The tool pahole can be used to find such implicit padding:

$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name c4iw_create_cq_resp \
drivers/infiniband/hw/cxgb4/iw_cxgb4.o

Then, structure layout can be compared between i386 and x86_64:

+++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100
--- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
@@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
__u32 size; /* 28 4 */
__u32 qid_mask; /* 32 4 */

- /* size: 36, cachelines: 1, members: 6 */
- /* last cacheline: 36 bytes */
+ /* size: 40, cachelines: 1, members: 6 */
+ /* padding: 4 */
+ /* last cacheline: 40 bytes */
};

This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.

If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.

This patch adds an explicit padding at end of structure
c4iw_create_cq_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info
leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_create_cq_resp as expected by
unpatched and patched i386 libcxgb4.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: cfdda9d764362 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Fixes: e24a72a3302a6 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 97df1c67 09-Apr-2014 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Use uninitialized_var()

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# b4e2901c 09-Apr-2014 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: SQ flush fix

There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status. Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work
request was actually complete and has a CQE in the CQ, then when that
CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
SQ/CQ state.

This is a very small timing hole and has only been hit once so far.

The fix is two-fold:

1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
status regardless of the QP state.

2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
before moving the state out of RTS. This ensures that the state
transition will not happen while another thread is in
post_rc_send(), because set_state() and post_rc_send() both aquire
the qp spinlock. Also, once we transition the state out of RTS,
subsequent calls to post_rc_send() will fail because the "in error"
bit is set. I don't think this fully closes the race where the FW
can get a FINI followed a SQ work request being posted (because
they are posted to differente EQs), but the #1 fix will handle the
issue by flushing the SQ work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 70b9c660 21-Mar-2014 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Ignore read reponse type 1 CQEs

These are generated by HW in some error cases and need to be
silently discarded.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 8a9c399e 19-Mar-2014 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Fix incorrect BUG_ON conditions

Based on original work from Jay Hernandez <jay@chelsio.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# ffd43592 19-Mar-2014 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZE

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# e24a72a3 18-Oct-2013 Dan Carpenter <dan.carpenter@oracle.com>

RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()

There is a four byte hole at the end of the "uresp" struct after the
->qid_mask member.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 27ca34f5 06-Aug-2013 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap

When determining how many WRs are completed with a signaled CQE,
correctly deal with queue wraps.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 1cf24dce 06-Aug-2013 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Fix QP flush logic

This patch makes following fixes in QP flush logic:

- correctly flushes unsignaled WRs followed by a signaled WR
- supports for flushing a CQ bound to multiple QPs
- resets cidx_flush if a active queue starts getting HW CQEs again
- marks WQ in error when we leave RTS. This was only being done for
user queues, but we need it for kernel queues too so that
post_send/post_recv will start returning the appropriate error
synchronously
- eats unsignaled read resp CQEs. HW always inserts CQEs so we must
silently discard them if the read work request was unsignaled.
- handles QP flushes with pending SW CQEs. The flush and out of order
completion logic has a bug where if out of order completions are
flushed but not yet polled by the consumer and the qp is then
flushed then we end up inserting duplicate completions.
- c4iw_flush_sq() should only flush wrs that have not already been
flushed. Since we already track where in the SQ we've flushed via
sq.cidx_flush, just start at that point and flush any remaining.
This bug only caused a problem in the presence of unsignaled work
requests.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>

[ Fixed sparse warning due to htonl/ntohl confusion. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>


# c34c97ad 20-Oct-2011 Jonathan Lallinger <jonathan@ogc.us>

RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic

Fix another place in the code where logic dealing with the t4_cqe was
using the wrong QID. This fixes the counting logic so that it tests
against the SQ QID instead of the RQ QID when counting RCQES.

Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# 581bbe2c 24-Oct-2011 Kumar Sanghvi <kumaras@chelsio.com>

RDMA/cxgb4: Serialize calls to CQ's comp_handler

Commit 01e7da6ba53c ("RDMA/cxgb4: Make sure flush CQ entries are
collected on connection close") introduced a potential problem where a
CQ's comp_handler can get called simultaneously from different places
in the iw_cxgb4 driver. This does not comply with
Documentation/infiniband/core_locking.txt, which states that at a
given point of time, there should be only one callback per CQ should
be active.

This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>.
Based on discussion between Parav Pandit and Steve Wise, this patch
fixes the above problem by serializing the calls to a CQ's
comp_handler using a spin_lock.

Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# e14d62c0 13-Oct-2011 Jonathan Lallinger <jonathan@ogc.us>

RDMA/cxgb4: Use correct QID in insert_recv_cqe()

When creating flushed receive CQEs, set the QPID field in the t4_cqe
to the SQ QID and not the RQ QID. Otherwise the poll code will not
find the correct QP context.

Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>

Signed-off-by: Roland Dreier <roland@purestorage.com>


# 2ff7d09a 01-Jun-2011 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs

Memory allocated for user CQs gets rounded up to the next page
boundary. And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.

This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>


# aadc4df3 10-Sep-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Centralize the wait logic

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 6ff0e343 10-Sep-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Ignore TERMINATE CQEs

T4 incorrectly inserts TERM CQEs into the CQ. Silently ignore them.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# c8e081a1 27-Sep-2010 Roland Dreier <rolandd@cisco.com>

RDMA/cxgb4: Fix warnings about casts to/from pointers of different sizes

Fix:

drivers/infiniband/hw/cxgb4/qp.c: In function ‘create_qp’:
drivers/infiniband/hw/cxgb4/qp.c:147: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_fini’:
drivers/infiniband/hw/cxgb4/qp.c:988: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_init’:
drivers/infiniband/hw/cxgb4/qp.c:1063: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/mem.c: In function ‘write_adapter_mem’:
drivers/infiniband/hw/cxgb4/mem.c:74: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cq.c: In function ‘destroy_cq’:
drivers/infiniband/hw/cxgb4/cq.c:58: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cq.c: In function ‘create_cq’:
drivers/infiniband/hw/cxgb4/cq.c:135: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cm.c: In function ‘fw6_msg’:
drivers/infiniband/hw/cxgb4/cm.c:2326: warning: cast to pointer from integer of different size

by casting pointers to unsigned long instead of u64.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# d3c814e8 20-Jul-2010 David Rientjes <rientjes@google.com>

RDMA/cxgb4: Remove dependency on __GFP_NOFAIL

The alloc_skb() in various allocations are failable, so remove
__GFP_NOFAIL from their masks.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 1973e8b8 10-Jun-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Avoid false GTS CIDX_INC overflows

The T4 IQ hw design assumes CIDX_INC credits will be returned on a
regular basis and always before the CIDX counter crosses over the PIDX
counter. For RDMA CQs, however, returning CIDX_INC credits is only
needed and desired when and if the CQ is armed for notification. This
can lead to a GTS write returning credits that causes the HW to reject
the credit update because it causes CIDX to pass PIDX. Once this
happens, the CIDX/PIDX counters get out of whack and an application
can miss a notification and get stuck blocked awaiting a notification.

To avoid this, we allocate the HW IQ 2x times the requested size.
This seems to avoid the false overflow failures. If we see more
issues with this, then we'll have to add code in the poll path to
return credits periodically like when the amount reaches 1/2 the queue
depth). I would like to avoid this as it adds a PCI write transaction
for applications that never arm the CQ (like most MPIs).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# f38926aa 02-Jun-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

RDMA/cxgb4: Use the DMA state API instead of the pci equivalents

This replace the PCI DMA state API (include/linux/pci-dma.h) with the
DMA equivalents since the PCI DMA state API will be obsolete.

No functional change.

For further information about the background:

http://marc.info/?l=linux-netdev&m=127037540020276&w=2

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 84172dee 20-May-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Optimize CQ overflow detection

1) save the timestamp flit in the cq when we consume a CQE.

2) always compare the saved flit with the previous entry flit when
reading the next CQE entry. If the flits don't compare, then we
have overflowed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# 895cf5f3 20-May-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: CQ size must be IQ size - 2

We need 1 extra entry for the status page and 1 to always have 1 free
entry to detect when the queue is full.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>


# cfdda9d7 21-Apr-2010 Steve Wise <larrystevenwise@gmail.com>

RDMA/cxgb4: Add driver for Chelsio T4 RNIC

Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>