History log of /linux-master/drivers/infiniband/hw/hfi1/ipoib_tx.c
Revision Date Author Comments
# 4ececeb8 11-Sep-2023 David Ahern <dsahern@kernel.org>

IB/hfi1: Remove open coded reference to skb frag offset

frag offset should be obtained from skb_frag_off; update the code.

Signed-off-by: David Ahern <dsahern@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/20230911215753.12325-1-dsahern@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# c9358de1 18-May-2023 Brendan Cunningham <bcunningham@cornelisnetworks.com>

IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate

The hfi1 user SDMA pinned-page cache will leave a stale cache entry when
the cache-entry's virtual address range is invalidated but that cache
entry is in-use by an outstanding SDMA request.

Subsequent user SDMA requests with buffers in or spanning the virtual
address range of the stale cache entry will result in packets constructed
from the wrong memory, the physical pages pointed to by the stale cache
entry.

To fix this, remove mmu_rb_node cache entries from the mmu_rb_handler
cache independent of the cache entry's refcount. Add 'struct kref
refcount' to struct mmu_rb_node and manage mmu_rb_node lifetime with
kref_get() and kref_put().

mmu_rb_node.refcount makes sdma_mmu_node.refcount redundant. Remove
'atomic_t refcount' from struct sdma_mmu_node and change sdma_mmu_node
code to use mmu_rb_node.refcount.

Move the mmu_rb_handler destructor call after a
wait-for-SDMA-request-completion call so mmu_rb_nodes that need
mmu_rb_handler's workqueue to queue themselves up for destruction from an
interrupt context may do so.

Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging")
Fixes: 00cbce5cbf88 ("IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests")
Link: https://lore.kernel.org/r/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com
Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 00cbce5c 06-Apr-2023 Patrick Kelsey <pat.kelsey@cornelisnetworks.com>

IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests

hfi1 user SDMA request processing has two bugs that can cause data
corruption for user SDMA requests that have multiple payload iovecs
where an iovec other than the tail iovec does not run up to the page
boundary for the buffer pointed to by that iovec.a

Here are the specific bugs:
1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len.
Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec
to the packet, even if some of those bytes are past
iovec->iov.iov_len and are thus not intended to be in the packet.
2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the
next iovec in user_sdma_request->iovs when the current iovec
is not PAGE_SIZE and does not contain enough data to complete the
packet. The transmitted packet will contain the wrong data from the
iovec pages.

This has not been an issue with SDMA packets from hfi1 Verbs or PSM2
because they only produce iovecs that end short of PAGE_SIZE as the tail
iovec of an SDMA request.

Fixing these bugs exposes other bugs with the SDMA pin cache
(struct mmu_rb_handler) that get in way of supporting user SDMA requests
with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So
this commit fixes those issues as well.

Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec
payload user SDMA requests can hit:
1. Overlapping memory ranges in mmu_rb_handler will result in duplicate
pinnings.
2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node),
the mmu_rb code (1) removes the existing entry under a lock, (2)
releases that lock, pins the new pages, (3) then reacquires the lock
to insert the extended mmu_rb_node.

If someone else comes in and inserts an overlapping entry between (2)
and (3), insert in (3) will fail.

The failure path code in this case unpins _all_ pages in either the
original mmu_rb_node or the new mmu_rb_node that was inserted between
(2) and (3).
3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is
incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node
could be evicted by another thread that gets mmu_rb_handler->lock and
checks mmu_rb_node->refcount before mmu_rb_node->refcount is
incremented.
4. Related to #2 above, SDMA request submission failure path does not
check mmu_rb_node->refcount before freeing mmu_rb_node object.

If there are other SDMA requests in progress whose iovecs have
pointers to the now-freed mmu_rb_node(s), those pointers to the
now-freed mmu_rb nodes will be dereferenced when those SDMA requests
complete.

Fixes: 7be85676f1d1 ("IB/hfi1: Don't remove RB entry when not needed.")
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# c874ad87 27-Feb-2023 Kang Chen <void0red@gmail.com>

IB/hifi1: add a null check of kzalloc_node in hfi1_ipoib_txreq_init

kzalloc_node may fails, check it and do the cleanup.

Fixes: b1151b74ff68 ("IB/hfi1: Fix alloc failure with larger txqueuelen")
Signed-off-by: Kang Chen <void0red@gmail.com>
Link: https://lore.kernel.org/r/20230227100212.910820-1-void0red@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 6b81b707 05-Jul-2022 Jakub Kicinski <kuba@kernel.org>

IB/hfi1: switch to netif_napi_add_tx()

Switch to the new API not requiring the NAPI_POLL_WEIGHT argument.

Link: https://lore.kernel.org/r/20220705230208.924408-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# b1151b74 15-Jan-2022 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Fix alloc failure with larger txqueuelen

The following allocation with large txqueuelen will result in the
following warning:

Call Trace:
__alloc_pages_nodemask+0x283/0x2c0
kmalloc_large_node+0x3c/0xa0
__kmalloc_node+0x22a/0x2f0
hfi1_ipoib_txreq_init+0x19f/0x330 [hfi1]
hfi1_ipoib_setup_rn+0xd3/0x1a0 [hfi1]
rdma_init_netdev+0x5a/0x80 [ib_core]
ipoib_intf_init+0x6c/0x350 [ib_ipoib]
ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib]
ipoib_add_one+0xbe/0x300 [ib_ipoib]
add_client_context+0x12c/0x1a0 [ib_core]
ib_register_client+0x147/0x190 [ib_core]
ipoib_init_module+0xdd/0x132 [ib_ipoib]
do_one_initcall+0x46/0x1c3
do_init_module+0x5a/0x220
load_module+0x14c5/0x17f0
__do_sys_init_module+0x13b/0x180
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca

For ipoib, the txqueuelen is modified with the module parameter
send_queue_size.

Fix by changing to use kv versions of the same allocator to handle the
large allocations. The allocation embeds a hdr struct that is dma mapped.
Change that struct to a pointer to a kzalloced struct.

Cc: stable@vger.kernel.org
Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/1642287756-182313-3-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 8c83d39c 15-Jan-2022 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Fix panic with larger ipoib send_queue_size

When the ipoib send_queue_size is increased from the default the following
panic happens:

RIP: 0010:hfi1_ipoib_drain_tx_ring+0x45/0xf0 [hfi1]
Code: 31 e4 eb 0f 8b 85 c8 02 00 00 41 83 c4 01 44 39 e0 76 60 8b 8d cc 02 00 00 44 89 e3 be 01 00 00 00 d3 e3 48 03 9d c0 02 00 00 <c7> 83 18 01 00 00 00 00 00 00 48 8b bb 30 01 00 00 e8 25 af a7 e0
RSP: 0018:ffffc9000798f4a0 EFLAGS: 00010286
RAX: 0000000000008000 RBX: ffffc9000aa0f000 RCX: 000000000000000f
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff88810ff08000 R08: ffff88889476d900 R09: 0000000000000101
R10: 0000000000000000 R11: ffffc90006590ff8 R12: 0000000000000200
R13: ffffc9000798fba8 R14: 0000000000000000 R15: 0000000000000001
FS: 00007fd0f79cc3c0(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000aa0f118 CR3: 0000000889c84001 CR4: 00000000001706e0
Call Trace:
<TASK>
hfi1_ipoib_napi_tx_disable+0x45/0x60 [hfi1]
hfi1_ipoib_dev_stop+0x18/0x80 [hfi1]
ipoib_ib_dev_stop+0x1d/0x40 [ib_ipoib]
ipoib_stop+0x48/0xc0 [ib_ipoib]
__dev_close_many+0x9e/0x110
__dev_change_flags+0xd9/0x210
dev_change_flags+0x21/0x60
do_setlink+0x31c/0x10f0
? __nla_validate_parse+0x12d/0x1a0
? __nla_parse+0x21/0x30
? inet6_validate_link_af+0x5e/0xf0
? cpumask_next+0x1f/0x20
? __snmp6_fill_stats64.isra.53+0xbb/0x140
? __nla_validate_parse+0x47/0x1a0
__rtnl_newlink+0x530/0x910
? pskb_expand_head+0x73/0x300
? __kmalloc_node_track_caller+0x109/0x280
? __nla_put+0xc/0x20
? cpumask_next_and+0x20/0x30
? update_sd_lb_stats.constprop.144+0xd3/0x820
? _raw_spin_unlock_irqrestore+0x25/0x37
? __wake_up_common_lock+0x87/0xc0
? kmem_cache_alloc_trace+0x3d/0x3d0
rtnl_newlink+0x43/0x60

The issue happens when the shift that should have been a function of the
txq item size mistakenly used the ring size.

Fix by using the item size.

Cc: stable@vger.kernel.org
Fixes: d47dfc2b00e6 ("IB/hfi1: Remove cache and embed txreq in ring")
Link: https://lore.kernel.org/r/1642287756-182313-2-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 6d1ebccb 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Add ring consumer and producers traces

These traces are used to debugging ring issues.

The ipoib_txreq needed to be moved to a header file to allow access from
the trace header file.

The trace changes include:
- new producer/consumer traces
- new allocation deallocation traces
- additional fidelity for SDMA engine prints

Fixes: 4bd00b55c978 ("IB/hfi1: Add AIP tx traces")
Link: https://lore.kernel.org/r/20210913132852.131370.9664.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b4b90a50 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Remove atomic completion count

The atomic is not needed.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132847.131370.54250.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# f5dc70a0 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Tune netdev xmit cachelines

This patch moves fields in the ring and creates a line for the producer
and the consumer.

The adds a consumer side variable that tracks the ring avail so that the
code doesn't have the read the other cacheline to get a count for every
packet. A read now only occurs when the avail is at 0.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132842.131370.15636.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# a7125869 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Get rid of tx priv backpointer

The txq has the backpointer, so this is a micro optimization for the tx
path.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132836.131370.89704.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 4bf0ca0c 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Get rid of hot path divide

The pointer math in this statemet does a divide;
struct hfi1_ipoib_txq *txq = &priv->txqs[napi - priv->tx_napis];

Elminate the divide by embedding the struct napi_strut in the txq and
getting the txq with a container_of() using the newly embedded napi.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132831.131370.3993.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# d47dfc2b 13-Sep-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Remove cache and embed txreq in ring

This patch removes kmem cache allocation and deallocation in favor of
having the ipoib_txreq in the ring.

The consumer is now the packet sending side allocating tx descriptors from
ring and the producer is the napi interrupt handling freeing tx
descriptors.

The locks are now eliminated because the napi tx lock insures a single
consumer and the napi handling insures a single producer.

The napi poll is converted to memory poll looking for items that have been
marked completed.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132826.131370.4397.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 7d5cfafe 22-Sep-2021 Guo Zhi <qtxuning1999@sjtu.edu.cn>

RDMA/hfi1: Fix kernel pointer leak

Pointers should be printed with %p or %px rather than cast to 'unsigned
long long' and printed with %llx. Change %llx to %p to print the secured
pointer.

Fixes: 042a00f93aad ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev")
Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# e9901043 14-Jul-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Indicate DMA wait when txq is queued for wakeup

There is no counter for dmawait in AIP, which hampers debugging
performance issues.

Add the counter increment when the txq is queued.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210715160440.142451.8278.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 326a2393 29-Mar-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Remove indirect call to hfi1_ipoib_send_dma()

hfi1_ipoib_send() directly calls hfi1_ipoib_send_dma() with no value add.

Fix by renaming hfi1_ipoib_send_dma() to hfi1_ipoib_send().

Link: https://lore.kernel.org/r/1617026056-50483-6-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 70d44c18 29-Mar-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Use napi_schedule_irqoff() for tx napi

The call is from an ISR context and napi_schedule_irqoff() can be used.

Change the call to the more efficient type.

Link: https://lore.kernel.org/r/1617026056-50483-5-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b536d4b2 29-Mar-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Correct oversized ring allocation

The completion ring for tx is using the wrong size to size the ring,
oversizing the ring by two orders of magniture.

Correct the allocation size and use kcalloc_node() to allocate the ring.
Fix mistaken GFP defines in similar allocations.

Link: https://lore.kernel.org/r/1617026056-50483-4-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 042a00f9 29-Mar-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev

The current rdma_netdev handling in ipoib hooks the tx_timeout handler,
but prints out a totally useless message that prevents effective debugging
especially when multiple transmit queues are being used.

Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1
to print additional information.

The existing non-helpful message is avoided when the driver has presented
a callback.

Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 4bd00b55 29-Mar-2021 Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

IB/hfi1: Add AIP tx traces

Add traces to allow for debugging issues with AIP tx.

Link: https://lore.kernel.org/r/1617026056-50483-2-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# aa0616a9 10-Nov-2020 Heiner Kallweit <hkallweit1@gmail.com>

IB/hfi1: switch to core handling of rx/tx byte/packet counters

Use netdev->tstats instead of a member of hfi1_ipoib_dev_priv for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 38fd98af 23-Jun-2020 Mike Marciniszyn <mike.marciniszyn@intel.com>

IB/hfi1: Add atomic triggered sleep/wakeup

When running iperf in a two host configuration the following trace can
occur:

[ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out

The issue happens because the current implementation relies on the netif
txq being stopped to control the flushing of the tx list.

There are two resources that the transmit logic can wait on and stop the
txq:
- SDMA descriptors
- Ring space to hold completions

The ring space is tested on the sending side and relieved when the ring is
consumed in the napi tx reaping.

Unfortunately, that reaping can run conncurrently with the workqueue
flushing of the txlist. If the txq is started just before the workitem
executes, the txlist will never be flushed, leading to the txq being
stuck.

Fix by:
- Adding sleep/wakeup wrappers
* Use an atomic to control the call to the netif routines inside the
wrappers

- Use another atomic to record ring space exhaustion
* Only wakeup when the a ring space exhaustion has happened and it
relieved

Add additional wrappers to clarify the ring space resource handling.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 82172b76 23-Jun-2020 Mike Marciniszyn <mike.marciniszyn@intel.com>

IB/hfi1: Correct -EBUSY handling in tx code

The current code mishandles -EBUSY in two ways:
- The flow change doesn't test the return from the flush and runs on to
process the current packet racing with the wakeup processing
- The -EBUSY handling for a single packet inserts the tx into the txlist
after the submit call, racing with the same wakeup processing

Fix the first by dropping the skb and returning NETDEV_TX_OK.

Fix the second by insuring the the list entry within the txreq is inited
when allocated. This enables the sleep routine to detect that the txreq
has used the non-list api and queue the packet to the txlist.

Both flaws can lead to having the flushing thread executing in causing two
threads to manipulate the txlist.

Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 0dc63bbe 09-Jun-2020 Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

RDMA/hfi1: Fix trivial mis-spelling of 'descriptor'

The word 'descriptor' is misspelled throughout the tree.

Fix it up accordingly:
decriptors -> descriptors

Link: https://lore.kernel.org/r/20200609124610.3445662-3-kieran.bingham+renesas@ideasonboard.com
Link: https://lore.kernel.org/r/20200609124610.3445662-12-kieran.bingham+renesas@ideasonboard.com
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d99dc602 10-May-2020 Gary Leshner <Gary.S.Leshner@intel.com>

IB/hfi1: Add functions to transmit datagram ipoib packets

This patch implements the mechanism to accelerate the transmit side of
a multiple transmit queue RDMA netdev by submitting the packets to
the SDMA engine directly instead of sending through the verbs layer.

This patch also changes the UD/SEND_ONLY op to output the entropy value
in byte 0 of deth[1]. UD/SEND_ONLY_WITH_IMMEDIATE uses the previous
behavior with no entropy value being output.

The code in the ipoib rdma netdev which submits tx requests upon
successful submission will call trace_sdma_output_ibhdr to output
the ibhdr to the trace buffer.

Link: https://lore.kernel.org/r/20200511160548.173205.45616.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>