History log of /linux-master/drivers/infiniband/hw/hfi1/tid_rdma.c
Revision Date Author Comments
# 4c09f740 10-Jan-2024 Randy Dunlap <rdunlap@infradead.org>

IB/hfi1: fix spellos and kernel-doc

Fix spelling mistakes as reported by codespell.
Fix kernel-doc warnings.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Link: https://lore.kernel.org/r/20240111042754.17530-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# a251c17a 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32() when possible

The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 991c4274 29-Jul-2021 Cai Huoqing <caihuoqing@baidu.com>

RDMA/hfi1: Fix typo in comments

Remove the repeated word 'the' from comments

Link: https://lore.kernel.org/r/20210729082346.1882-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 620ccaaa 10-May-2021 Zhen Lei <thunder.leizhen@huawei.com>

IB/hfi1: Delete an unneeded bool conversion

The result of an expression consisting of a single relational operator is
already of the bool type and does not need to be evaluated explicitly.

No functional change.

Link: https://lore.kernel.org/r/20210510120635.3636-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# cd7727fd 25-Jan-2021 Lee Jones <lee.jones@linaro.org>

RDMA/hw/hfi1/tid_rdma: Fix a plethora of kernel-doc issues

Fixes the following W=1 kernel build warning(s):

drivers/infiniband/hw/hfi1/tid_rdma.c:321: warning: Function parameter or member 'rdi' not described in 'qp_to_rcd'
drivers/infiniband/hw/hfi1/tid_rdma.c:321: warning: Function parameter or member 'qp' not described in 'qp_to_rcd'
drivers/infiniband/hw/hfi1/tid_rdma.c:505: warning: Function parameter or member 'queue' not described in 'kernel_tid_waiters'
drivers/infiniband/hw/hfi1/tid_rdma.c:536: warning: Function parameter or member 'rcd' not described in 'dequeue_tid_waiter'
drivers/infiniband/hw/hfi1/tid_rdma.c:536: warning: Function parameter or member 'queue' not described in 'dequeue_tid_waiter'
drivers/infiniband/hw/hfi1/tid_rdma.c:536: warning: Function parameter or member 'qp' not described in 'dequeue_tid_waiter'
drivers/infiniband/hw/hfi1/tid_rdma.c:562: warning: Function parameter or member 'queue' not described in 'queue_qp_for_tid_wait'
drivers/infiniband/hw/hfi1/tid_rdma.c:607: warning: Function parameter or member 'qp' not described in 'tid_rdma_schedule_tid_wakeup'
drivers/infiniband/hw/hfi1/tid_rdma.c:639: warning: Function parameter or member 'work' not described in 'tid_rdma_trigger_resume'
drivers/infiniband/hw/hfi1/tid_rdma.c:666: warning: Function parameter or member 'qp' not described in '_tid_rdma_flush_wait'
drivers/infiniband/hw/hfi1/tid_rdma.c:666: warning: Function parameter or member 'queue' not described in '_tid_rdma_flush_wait'
drivers/infiniband/hw/hfi1/tid_rdma.c:713: warning: Function parameter or member 'rcd' not described in 'kern_reserve_flow'
drivers/infiniband/hw/hfi1/tid_rdma.c:713: warning: Function parameter or member 'last' not described in 'kern_reserve_flow'
drivers/infiniband/hw/hfi1/tid_rdma.c:879: warning: Function parameter or member 'flow' not described in 'tid_rdma_find_phys_blocks_4k'
drivers/infiniband/hw/hfi1/tid_rdma.c:879: warning: Function parameter or member 'pages' not described in 'tid_rdma_find_phys_blocks_4k'
drivers/infiniband/hw/hfi1/tid_rdma.c:879: warning: Function parameter or member 'npages' not described in 'tid_rdma_find_phys_blocks_4k'
drivers/infiniband/hw/hfi1/tid_rdma.c:879: warning: Function parameter or member 'list' not described in 'tid_rdma_find_phys_blocks_4k'
drivers/infiniband/hw/hfi1/tid_rdma.c:972: warning: Function parameter or member 'list' not described in 'tid_flush_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:972: warning: Function parameter or member 'idx' not described in 'tid_flush_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:972: warning: Function parameter or member 'pages' not described in 'tid_flush_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:972: warning: Function parameter or member 'sets' not described in 'tid_flush_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:1017: warning: Function parameter or member 'flow' not described in 'tid_rdma_find_phys_blocks_8k'
drivers/infiniband/hw/hfi1/tid_rdma.c:1017: warning: Function parameter or member 'pages' not described in 'tid_rdma_find_phys_blocks_8k'
drivers/infiniband/hw/hfi1/tid_rdma.c:1017: warning: Function parameter or member 'npages' not described in 'tid_rdma_find_phys_blocks_8k'
drivers/infiniband/hw/hfi1/tid_rdma.c:1017: warning: Function parameter or member 'list' not described in 'tid_rdma_find_phys_blocks_8k'
drivers/infiniband/hw/hfi1/tid_rdma.c:1083: warning: Function parameter or member 'flow' not described in 'kern_find_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:1083: warning: Function parameter or member 'pages' not described in 'kern_find_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:1083: warning: Function parameter or member 'ss' not described in 'kern_find_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:1083: warning: Function parameter or member 'last' not described in 'kern_find_pages'
drivers/infiniband/hw/hfi1/tid_rdma.c:1604: warning: Function parameter or member 'req' not described in 'hfi1_kern_exp_rcv_free_flows'
drivers/infiniband/hw/hfi1/tid_rdma.c:3458: warning: Function parameter or member 'qp' not described in 'hfi1_tid_write_alloc_resources'
drivers/infiniband/hw/hfi1/tid_rdma.c:3458: warning: Function parameter or member 'intr_ctx' not described in 'hfi1_tid_write_alloc_resources'

Link: https://lore.kernel.org/r/20210126124732.3320971-16-lee.jones@linaro.org
Cc: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 4846bf44 19-Nov-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

IB/hfi1: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/13cc2fe2cf8a71a778dbb3d996b07f5e5d04fd40.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b25e8e85 11-Aug-2020 Kaike Wan <kaike.wan@intel.com>

RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request

The following message occurs when running an AI application with TID RDMA
enabled:

hfi1 0000:7f:00.0: hfi1_0: [QP74] hfi1_tid_timeout 4084
hfi1 0000:7f:00.0: hfi1_0: [QP70] hfi1_tid_timeout 4084

The issue happens when TID RDMA WRITE request is followed by an
IB_WR_RDMA_WRITE_WITH_IMM request, the latter could be completed first on
the responder side. As a result, no ACK packet for the latter could be
sent because the TID RDMA WRITE request is still being processed on the
responder side.

When the TID RDMA WRITE request is eventually completed, the requester
will wait for the IB_WR_RDMA_WRITE_WITH_IMM request to be acknowledged.

If the next request is another TID RDMA WRITE request, no TID RDMA WRITE
DATA packet could be sent because the preceding IB_WR_RDMA_WRITE_WITH_IMM
request is not completed yet.

Consequently the IB_WR_RDMA_WRITE_WITH_IMM will be retried but it will be
ignored on the responder side because the responder thinks it has already
been completed. Eventually the retry will be exhausted and the qp will be
put into error state on the requester side. On the responder side, the TID
resource timer will eventually expire because no TID RDMA WRITE DATA
packets will be received for the second TID RDMA WRITE request. There is
also risk of a write-after-write memory corruption due to the issue.

Fix by adding a requester side interlock to prevent any potential data
corruption and TID RDMA protocol error.

Fixes: a0b34f75ec20 ("IB/hfi1: Add interlock between a TID RDMA request and other requests")
Link: https://lore.kernel.org/r/20200811174931.191210.84093.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org> # 5.4.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 6f24b159 21-Jul-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

IB/hfi1: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with the
new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7-rc7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Link: https://lore.kernel.org/r/20200721133455.GA14363@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 28b70cd9 23-Jun-2020 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Do not destroy hfi1_wq when the device is shut down

The workqueue hfi1_wq is destroyed in function shutdown_device(), which is
called by either shutdown_one() or remove_one(). The function
shutdown_one() is called when the kernel is rebooted while remove_one() is
called when the hfi1 driver is unloaded. When the kernel is rebooted,
hfi1_wq is destroyed while all qps are still active, leading to a kernel
crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE)
crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018
task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000
RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046
RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000
RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b
RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900
R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8
R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000
FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0
Call Trace:
[<ffffffff94cb8105>] queue_work_on+0x45/0x50
[<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1]
[<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt]
[<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280
[<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
[<ffffffff94ca7f58>] call_timer_fn+0x38/0x110
[<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
[<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300
[<ffffffff94ca0f05>] __do_softirq+0xf5/0x280
[<ffffffff9537832c>] call_softirq+0x1c/0x30
[<ffffffff94c2e675>] do_softirq+0x65/0xa0
[<ffffffff94ca1285>] irq_exit+0x105/0x110
[<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60
[<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170
<EOI>
[<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0
[<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230
[<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0
[<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0
[<ffffffff94c57db7>] start_secondary+0x1f7/0x270
[<ffffffff94c000d5>] start_cpu+0x5/0x14

The solution is to destroy the workqueue only when the hfi1 driver is
unloaded, not when the device is shut down. In addition, when the device
is shut down, no more work should be scheduled on the workqueues and the
workqueues are flushed.

Fixes: 8d3e71136a08 ("IB/{hfi1, qib}: Add handling of kernel restart")
Link: https://lore.kernel.org/r/20200623204047.107638.77646.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 84e3b19a 10-May-2020 Gary Leshner <Gary.S.Leshner@intel.com>

IB/hfi1: Remove module parameter for KDETH qpns

The module parameter for KDETH qpns is being removed in favor
of always using the default value of 0x80 as the qpn prefix.
Defines have been added for various KDETH values including
the prefix of 0x80.
The reserved range now starts at the base value for KDETH
qpns (0x80) and extends up to and including the last qpn for
other reserved QP prefixed types.
Adjust other QP prefixed define names to match KDETH defined
names.

Link: https://lore.kernel.org/r/20200511160600.173205.27508.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# b2ff0d51 19-Dec-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Adjust flow PSN with the correct resync_psn

When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the flow
PSN of the TID RDMA ACK packet. The resync_psn value indicates the last
flow PSN for which a TID RDMA WRITE DATA packet has been received by the
responder and the requester should resend TID RDMA WRITE DATA packets,
starting from the next flow PSN.

However, if resync_psn points to the last flow PSN for a segment and the
next segment flow PSN starts with a new generation number, use of the old
resync_psn to adjust the flow PSN for the next segment will lead to
miscalculation, resulting in WARN_ON and sge rewinding errors:

WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
Call Trace:
<IRQ> [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
[<ffffffff9dc97648>] __warn+0xd8/0x100
[<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
[<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
[<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
[<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
[<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
[<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
[<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
[<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
[<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
[<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
[<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
[<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
[<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
[<ffffffff9e36b362>] common_interrupt+0x162/0x162
<EOI> [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
[<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
[<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
[<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
[<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
[<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
[<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
[<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[<ffffffff9dcc1c31>] kthread+0xd1/0xe0
[<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
[<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40

This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.

Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c2be3865 25-Oct-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA

For a TID RDMA WRITE request, a QP on the responder side could be put into
a queue when a hardware flow is not available. A RNR NAK will be returned
to the requester with a RNR timeout value based on the position of the QP
in the queue. The tid_rdma_flow_wt variable is used to calculate the
timeout value and is determined by using a MTU of 4096 at the module
loading time. This could reduce the timeout value by half from the desired
value, leading to excessive RNR retries.

This patch fixes the issue by calculating the flow weight with the real
MTU assigned to the QP.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c1abd865 25-Oct-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet

The index r_tid_ack is used to indicate the next TID RDMA WRITE request to
acknowledge in the ring s_ack_queue[] on the responder side and should be
set to a valid index other than its initial value before r_tid_tail is
advanced to the next TID RDMA WRITE request and particularly before a TID
RDMA ACK is built. Otherwise, a NULL pointer dereference may result:

BUG: unable to handle kernel paging request at ffff9a32d27abff8
IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
PGD 2749032067 PUD 0
Oops: 0000 1 SMP
Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
Call Trace:
[<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
[<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
[<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
[<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffffaf0bae31>] kthread+0xd1/0xe0
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
[<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP <ffff9a11776abd08>
CR2: ffff9a32d27abff8

This problem can happen if a RESYNC request is received before r_tid_ack
is modified.

This patch fixes the issue by making sure that r_tid_ack is set to a valid
value before a TID RDMA ACK is built. Functions are defined to simplify
the code.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 9ed5bd7d 04-Oct-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Avoid excessive retry for TID RDMA READ request

A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.

However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.

This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.

Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c05fc156 11-Sep-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add traces for TID RDMA READ

This patch adds traces to debug packet loss and retry for TID RDMA READ
protocol.

Link: https://lore.kernel.org/r/20190911113041.126040.64541.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d9d1f5e7 15-Aug-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Drop stale TID RDMA packets that cause TIDErr

In a congested fabric with adaptive routing enabled, traces show that
packets could be delivered out of order. A stale TID RDMA data packet
could lead to TidErr if the TID entries have been released by duplicate
data packets generated from retries, and subsequently erroneously force
the qp into error state in the current implementation.

Since the payload has already been dropped by hardware, the packet can
be simply dropped and it is no longer necessary to put the qp into
error state.

Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 90fdae66 15-Aug-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet

In a congested fabric with adaptive routing enabled, traces show that
packets could be delivered out of order, which could cause incorrect
processing of stale packets. For stale TID RDMA WRITE DATA packets that
cause KDETH EFLAGS errors, this patch adds additional checks before
processing the packets.

Fixes: d72fe7d5008b ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a8adbf7d 15-Aug-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet

In a congested fabric with adaptive routing enabled, traces show that
packets could be delivered out of order, which could cause incorrect
processing of stale packets. For stale TID RDMA READ RESP packets that
cause KDETH EFLAGS errors, this patch adds additional checks before
processing the packets.

Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 35d5c8b8 15-Aug-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet

When processing a TID RDMA READ RESP packet that causes KDETH EFLAGS
errors, the packet's IB PSN is checked against qp->s_last_psn and
qp->s_psn without the protection of qp->s_lock, which is not safe.

This patch fixes the issue by acquiring qp->s_lock first.

Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192039.105923.7852.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d58c1834 15-Aug-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Drop stale TID RDMA packets

In a congested fabric with adaptive routing enabled, traces show that
the sender could receive stale TID RDMA NAK packets that contain newer
KDETH PSNs and older Verbs PSNs. If not dropped, these packets could
cause the incorrect rewinding of the software flows and the incorrect
completion of TID RDMA WRITE requests, and eventually leading to memory
corruption and kernel crash.

The current code drops stale TID RDMA ACK/NAK packets solely based
on KDETH PSNs, which may lead to erroneous processing. This patch
fixes the issue by also checking the Verbs PSN. Addition checks are
added before rewinding the TID RDMA WRITE DATA packets.

Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>


# f4d46119 14-Jul-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn

When a TID sequence error occurs while receiving TID RDMA READ RESP
packets, all packets after flow->flow_state.r_next_psn should be dropped,
including those response packets for subsequent segments.

The current implementation will drop the subsequent response packets for
the segment to complete next, but may accept packets for subsequent
segments and therefore mistakenly advance the r_next_psn fields for the
corresponding software flows. This may result in failures to complete
subsequent segments after the current segment is completed.

The fix is to only use the flow pointed by req->clear_tail for checking
KDETH PSN instead of finding a flow from the request's flow array.

Fixes: b885d5be9ca1 ("IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164540.74174.54702.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# dc25b239 14-Jul-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Field not zero-ed when allocating TID flow memory

The field flow->resync_npkts is added for TID RDMA WRITE request and
zero-ed when a TID RDMA WRITE RESP packet is received by the requester.
This field is used to rewind a request during retry in the function
hfi1_tid_rdma_restart_req() shared by both TID RDMA WRITE and TID RDMA
READ requests. Therefore, when a TID RDMA READ request is retried, this
field may not be initialized at all, which causes the retry to start at an
incorrect psn, leading to the drop of the retry request by the responder.

This patch fixes the problem by zeroing out the field when the flow memory
is allocated.

Fixes: 838b6fd2d9ca ("IB/hfi1: TID RDMA RcvArray programming and TID allocation")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190715164534.74174.6177.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 5a3113d1 17-Jun-2019 Geert Uytterhoeven <geert+renesas@glider.be>

IB/hfi1: Spelling s/statisfied/satisfied/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# cc78076a 09-Jun-2019 Mike Marciniszyn <mike.marciniszyn@intel.com>

IB/hfi1: Correct tid qp rcd to match verbs context

The qp priv rcd pointer doesn't match the context being used for verbs
causing issues when 9B and kdeth packets are processed by different
receive contexts and hence different CPUs.

When running on different CPUs the following panic can occur:

WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230
CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
Call Trace:
<IRQ> [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b
[<ffffffffb74916d8>] __warn+0xd8/0x100
[<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80
[<ffffffffb7768671>] __list_del_entry+0xa1/0xd0
[<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1]
[<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1]
[<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1]
[<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0
[<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80
[<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60
[<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150
[<ffffffffb742d504>] handle_irq+0xe4/0x1a0
[<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0
[<ffffffffb7b16362>] common_interrupt+0x162/0x162
<EOI> [<ffffffffb775a326>] ? memcpy+0x6/0x110
[<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs]
[<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs]
[<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs]
[<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs]
[<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs]
[<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
[<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
[<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs]
[<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl]
[<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl]
[<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs]
[<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl]
[<ffffffffb74cee80>] ? wake_up_state+0x20/0x20
[<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl]
[<ffffffffb74bae31>] kthread+0xd1/0xe0
[<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
[<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40

Fix by reading the map entry in the same manner as the hardware so that
the kdeth and verbs contexts match.

Cc: <stable@vger.kernel.org>
Fixes: 5190f052a365 ("IB/hfi1: Allow the driver to initialize QP priv struct")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# cfcc048c 25-May-2019 YueHaibing <yuehaibing@huawei.com>

IB/hfi1: Remove set but not used variables 'offset' and 'fspsn'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/hfi1/tid_rdma.c: In function tid_rdma_rcv_error:
drivers/infiniband/hw/hfi1/tid_rdma.c:2029:7: warning: variable offset set but not used [-Wunused-but-set-variable]
drivers/infiniband/hw/hfi1/tid_rdma.c: In function hfi1_rc_rcv_tid_rdma_ack:
drivers/infiniband/hw/hfi1/tid_rdma.c:4555:35: warning: variable fspsn set but not used [-Wunused-but-set-variable]

'offset' is never used since introduction in
commit d0d564a1caac ("IB/hfi1: Add functions to receive TID RDMA READ request")

'fspsn' is never used since introduciotn in
commit 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 3c176c9d 11-Apr-2019 John Fleck <john.fleck@intel.com>

IB/hfi1: Remove reference to RHF.VCRCErr

The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d737b25b 10-Apr-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Do not flush send queue in the TID RDMA second leg

When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.

This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.

Fixes: 70dcb2e3dc6a ("IB/hfi1: Add the TID second leg send packet builder")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 747b931f 18-Mar-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Implement CCA for TID RDMA protocol

Currently, FECN handling is not implemented on TID RDMA expected receive
packets and therefore CCA can't be turned on when TID RDMA is
enabled. This patch adds the CCA support to TID RDMA protocol by:

- modifying FECN RSM rule to include kernel receive contexts

- For TID_RDMA READ RESP or TID RDMA ACK packet, a CNP will be sent out if
the FECN bit is set. For other TID RDMA packets that generate at least
one response packet, the BECN bit will be set in the first response
packet

- Copying expected packet data to destination buffer when FECN bit is set
in the TID RDMA READ RESP or TID RDMA WRITE DATA packet. In this case,
the expected packet is received as an eager packet

- Handling the TID sequence error for subsequent normal expected packets.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# b885d5be 18-Mar-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE

For expected packet receiving, the hfi1 hardware checks the KDETH PSN
automatically. However, when sequence error occurs, the hfi1 driver can
check the sequence instead until the hardware flow generation is reloaded.

TID RDMA READ and WRITE protocols implement similar software checking
mechanisms, but with different flags and different local variables to
store next expected PSN.

Unify the handling by using only one set of flag and local variable for
both TID RDMA READ and WRITE protocols.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 6a40693a 18-Mar-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to read next expected psn from hardware flow

This patch adds a function to read next expected KDETH PSN from hardware
flow to simplify the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# f6f3f532 18-Mar-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA

The reference of destination memory region is first obtained when TID RDMA
WRITE request is first received on the responder side. This reference is
released once all TID RDMA WRITE RESP packets are sent to the requester
side, even though not all TID RDMA WRITE DATA packets may have been
received. This early release will especially be undesired if the software
needs to access the destination memory before the last data packet is
received.

This patch delays the release of the MR until all TID RDMA DATA packets
have been received. A helper function to release the reference is also
created to simplify the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 7264235e 20-Feb-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

IB/hfi1: Add missing break in switch statement

Fix the following warning by adding a missing break:

drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’:
drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
switch (prev->wr.opcode) {
^~~~~~
drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here
case IB_WR_RDMA_READ:
^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: c6c231175ccd ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kaike Wan <Kaike.wan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 34025fb0 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Prioritize the sending of ACK packets

ACK packets are generally associated with request completion and resource
release and therefore should be sent first. This patch optimizes the
send engine by using the following policies:
(1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags
should have their priority incremented;
(2) QPs with ACK or TID-ACK packet queued should have their priority
incremented;
(3) When a QP is queued to the wait list due to resource constraints, it
will be queued to the head if it has ACK packet to send;
(4) When selecting qps to run from the wait list, the one with the highest
priority and starve_cnt will be selected; each priority will be equivalent
to a fixed number of starve_cnt (16).

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a05c9bdc 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add static trace for TID RDMA WRITE protocol

This patch makes the following changes to the static trace:
1. Adds the decoding of TID RDMA WRITE packets in IB header trace;
2. Adds trace events for various stages of the TID RDMA WRITE
protocol. These events provide a fine-grained control for monitoring
and debugging the hfi1 driver in the filed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# ad00889e 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Enable TID RDMA WRITE protocol

This patch enables TID RDMA WRITE protocol by converting a qualified
RDMA WRITE request into a TID RDMA WRITE request internally:
(1) The TID RDMA cability must be enabled;
(2) The request must start on a 4K page boundary;
(3) The request length must be a multiple of 4K and must be larger or
equal to 256K.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c6c23117 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add interlock between TID RDMA WRITE and other requests

This locking mechanism is designed to provent vavious memory corruption
scenarios from occurring when requests are pipelined, especially when
RDMA WRITE requests are interleaved with TID RDMA READ requests:
1. READ-AFTER-READ;
2. READ-AFTER-WRITE;
3. WRITE-AFTER-READ;
4. WRITE-AFTER-WRITE.
When memory corruption is likely, a request will be held back until
previous requests have been completed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3c6cb20a 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs

This patch integrates TID RDMA WRITE protocol into normal RDMA verbs
framework. The TID RDMA WRITE protocol is an end-to-end protocol
between the hfi1 drivers on two OPA nodes that converts a qualified
RDMA WRITE request into a TID RDMA WRITE request to avoid data copying
on the responder side.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 572f0c33 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add the dual leg code

The "Second Leg" of the TID RDMA WRITE protocol deals with
the transfer of data and ack packets, which are in the KDETH
PSN space, as opposed to the IB PSN space.

Therefore, the Second Leg could be considered as a separate
state machine. As such, it is handled by a different work
queue item which is scheduled along with the normal IB state
machine work item.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 24c5bfea 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add the TID second leg ACK packet builder

This patch adds the TID packet builder for the responder side, which
contains the state machine to build TID RDMA ACK packet for either
TID RDMA WRITE DATA or TID RDMA RESYNC packets.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 70dcb2e3 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add the TID second leg send packet builder

To improve performance, the TID RDMA WRITE protocol is designed to
own a second leg to send data and ack packets in the KDETH PSN space.
This patch adds the packet builder for the requester side, which
contains the state machine to build TID RDMA WRITE DATA and TID
RDMA RESYNC packet.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6e38fca6 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Resend the TID RDMA WRITE DATA packets

This patch adds the logic to resend TID RDMA WRITE DATA packets.
The tracking indices will be reset properly so that the correct
TID entries will be used.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 7cf0ad67 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to receive TID RDMA RESYNC packet

This patch adds a function to receive TID RDMA RESYNC packet on the
responder side. The QP's hardware flow will be updated and all
allocated software flows will be updated accordingly in order to
drop all stale packets.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 6e391c6a 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to build TID RDMA RESYNC packet

This patch adds a function to build TID RDMA RESYNC packet, which is
sent by the requester to notify the responder that no TID RDMA ACK
packet has been received for a given KDETH PSN.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 829eaee5 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add TID RDMA retry timer

This patch adds the TID RDMA retry timer to make sure that TID RDMA
WRITE DATA packets for a segment are received successfully by the
responder. This timer is generally armed when the last TID RDMA
WRITE DATA packet for a segment is sent out and stopped when all
TID RDMA DATA packets are acknowledged.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 9e93e967 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to receive TID RDMA ACK packet

This patch adds a function to receive TID RDMA ACK packet, which could
be an acknowledge to either a TID RDMA WRITE DATA packet or an TID
RDMA RESYNC packet. For an ACK to TID RDMA WRITE DATA packet, the
request segments are completed appropriately. For an ACK to a TID
RDMA RESYNC packet, any pending segment flow information is updated
accordingly.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 0f75e325 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to build TID RDMA ACK packet

This patch adds a function to build TID RDMA ACJ packet, which is also
in the KDETH PSN space for packet ordering. This packet is used to
acknowledge the receiving of all the TID RDMA WRITE DATA packets
before the given KDETH PSN. Similar to RC ACK packets, TID RDMA ACK
packets could also be coalesced.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d72fe7d5 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet

This patch adds a function to receive TID RDMA WRITE DATA packet,
which is in the KDETH PSN space in packet ordering. Due to the use
of header suppression, software is generally only notified when
the last data packet for a segment is received. This patch also
adds code to handle KDETH EFLAGS errors for ingress TID RDMA WRITE
DATA packets.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 539e1908 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to build TID RDMA WRITE DATA packet

This patch adds a function to build TID RDMA WRITE DATA packet.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 72a0ea99 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to receive TID RDMA WRITE response

This patch adds a function to receive TID RDMA WRITE response.
The TID entries will be stored for encoding TID RDMA WRITE DATA
packet later.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3c759e00 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add TID resource timer

This patch adds the TID resource timer, which is used by the responder
to free any TID resources that are allocated for TID RDMA WRITE request
and not returned by the requester after a reasonable time.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 38d46d36 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to build TID RDMA WRITE response

This patch adds the function to build TID RDMA WRITE response. The
main role of the TID RDMA WRITE RESP packet is to send TID entries
to the requester so that they can be used to encode TID RDMA WRITE
DATA packet.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 07b92370 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add functions to receive TID RDMA WRITE request

This patch adds the functions to receive TID RDMA WRITE request. The
request will be stored in the QP's s_ack_queue. This patch also adds
code to handle duplicate TID RDMA WRITE request and a function to
allocate TID resources for data receiving on the responder side.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 4f9264d1 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add an s_acked_ack_queue pointer

The s_ack_queue is managed by two pointers into the ring:
r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of
where the next received request is going to be placed and s_tail_ack_queue
is the entry of the request currently being processed. This works
perfectly fine for normal Verbs as the requests are processed one at a
time and the s_tail_ack_queue is not moved until the request that it
points to is fully completed.

In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and
the two pointers can easily be used to determine "queue full" and "queue
empty" conditions.

The detection of these two conditions are imported in determining when an
old entry can safely be overwritten with a new received request and the
resources associated with the old request be safely released.

When pipelined TID RDMA WRITE is introduced into this mix, things look
very different. r_head_ack_queue is still the point at which a newly
received request will be inserted, s_tail_ack_queue is still the
currently processed request. However, with pipelined TID RDMA WRITE
requests, s_tail_ack_queue moves to the next request once all TID RDMA
WRITE responses for that request have been sent. The rest of the protocol
for a particular request is managed by other pointers specific to TID RDMA
- r_tid_tail and r_tid_ack - which point to the entries for which the next
TID RDMA DATA packets are going to arrive and the request for which
the next TID RDMA ACK packets are to be generated, respectively.

What this means is that entries in the ring, which are "behind"
s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no
longer considered complete. This is where the problem is - a newly
received request could potentially overwrite a still active TID RDMA WRITE
request.

The reason why the TID RDMA pointers trail s_tail_ack_queue is that the
normal Verbs send engine uses s_tail_ack_queue as the pointer for the next
response. Since TID RDMA WRITE responses are processed by the normal Verbs
send engine, s_tail_ack_queue had to be moved to the next entry once all
TID RDMA WRITE response packets were sent to get the desired pipelining
between requests. Doing otherwise would mean that the normal Verbs send
engine would not be able to send the TID RDMA WRITE responses for the next
TID RDMA request until the current one is fully completed.

This patch introduces the s_acked_ack_queue index to point to the next
request to complete on the responder side. For requests other than TID
RDMA WRITE, s_acked_ack_queue should always be kept in sync with
s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind
s_tail_ack_queue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# c098bbb0 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Build TID RDMA WRITE request

This patch adds the functions to build TID RDMA WRITE request.
The work request opcode, packet opcode, and packet formats for TID
RDMA WRITE protocol are also defined in this patch.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 3ce5daa2 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add static trace for TID RDMA READ protocol

This patch makes the following changes to the static trace:
1. Adds the decoding of TID RDMA READ packets in IB header trace;
2. Tracks qpriv->s_flags and iow_flags in qpsleepwakeup trace;
3. Adds a new event to track RC ACK receiving;
4. Adds trace events for various stages of the TID RDMA READ
protocol. These events provide a fine-grained control for monitoring
and debugging the hfi1 driver in the filed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# f1ab4efa 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Enable TID RDMA READ protocol

This patch enables TID RDMA READ protocol by converting a qualified
RDMA READ request into a TID RDMA READ request internally:
(1) The TID RDMA capability must be enabled;
(2) The request must start on a 4K page boundary and all receiving
buffers must start on 4K page boundaries;
(3) The request length must be a multiple of 4K and must be larger or
equal to 256K. Each receiving buffer length must be a multiple of 4K.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a0b34f75 24-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add interlock between a TID RDMA request and other requests

This locking mechanism is designed to provent vavious memory corruption
scenarios from occurring when requests are pipelined, especially when
RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE
requests:
1. READ-AFTER-READ;
2. READ-AFTER-WRITE;
3. WRITE-AFTER-READ;
When memory corruption is likely, a request will be held back until
previous requests have been completed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 24b11923 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Integrate TID RDMA READ protocol into RC protocol

This patch integrates the TID RDMA READ protocol into the IB RC protocol.
This protocol is an end-to-end protocol between the hfi1 drivers on two
OPA nodes that converts a qualified RDMA READ request into a TID RDMA
READ request to avoid data copying on the requester side. The following
codes are added in this patch:
- Send the TID RDMA READ request;
- Complete the TID RDMA READ send request;
- Send the TID RDMA READ response;
- Complete the TID RDMA READ request;

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# b126078e 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add functions for restarting TID RDMA READ request

This patch adds functions to retry TID RDMA READ request. Since TID RDMA
READ request could be retried from any segment boundary, it requires
a number of tracking fields in various structures and those fields
should be reset properly. The qp->s_num_rd_atomic field is reset before
retry and therefore should be incremented for each new or retried
RDMA READ or atomic request.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 9905bf06 05-Feb-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add functions to receive TID RDMA READ response

This patch adds the functions to receive TID RDMA READ response. The TID
resource information in the KDETH packet header will direct the hardware
to deliver the packet payload to the user buffer automatically and the
software will handle the packet header for the last packet of a segment
as all other packet headers are suppressed by default. The TID entries
will be freed when all packets for a segment have been received. This
patch also adds the functions to handle KDETH eflag errors, including
flow sequence and generation errors, when a TID RDMA READ response
packet is received . The flow sequence error can be recovered by software
checking of the flow sequence and will disappear when the hardware flow
is programmed with a new generation number.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 1db21b50 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add a function to build TID RDMA READ response

This patch adds the function to build TID RDMA READ response packet.
The previously received TID resource information will be used to
build the KDETH packet, which will direct the delivery of packet payload
by hardware.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d0d564a1 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add functions to receive TID RDMA READ request

This patch adds the functions to receive TID RDMA READ request. The TID
resource information will be stored and tracked. Duplicate request
will also be handled properly.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 742a3826 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add functions to build TID RDMA READ request

This patch adds the helper functions to build the TID RDMA READ request
on the requester side. The key is to allocate TID resources (TID flow
and TID entries) and send the resource information to the responder side
along with the read request. Since the TID resources are limited, each
TID RDMA READ request has to be split into segments with a default
segment size of 256K. A software flow is allocated to track the data
transaction for each segment. The work request opcode, packet opcode, and
packet formats for TID RDMA READ protocol are also defined in this patch.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 84f4a40d 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add static trace for flow and TID management functions

This patch adds the static trace for the flow and TID management
functions to help debugging in the filed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 2f16a696 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add the counter n_tidwait

This patch adds the counter n_tidwait to count the number of times the
TID resource allocator has to wait for TID resources.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 838b6fd2 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: TID RDMA RcvArray programming and TID allocation

TID entries are used by hfi1 hardware to receive data payload from
incoming packets directly into a user buffer and thus avoid data copying
by software. This patch implements the functions for TID allocation,
freeing, and programming TID RcvArray entries in hardware for kernel
clients. TID entries are managed via lists of TID groups similar to PSM.
Furthermore, to track TID resource allocation for each request, software
flows are also allocated and freed as needed. Since software flows
consume large amount of memory for tracking TID allocation and freeing,
it is generally desirable to allocate them dynamically in the send queue
and only for TID RDMA requests, but pre-allocate them for receive queue
because the send queue could have thousands of entries while the receive
queue has only a limited number of entries.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 37356e78 05-Feb-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: TID RDMA flow allocation

The hfi1 hardware flow is a hardware flow-control mechanism for a KDETH
data packet that is received on a hfi1 port. It validates the packet by
checking both the generation and sequence. Each QP that uses the TID RDMA
mechanism will allocate a hardware flow from its receiving context for
any incoming KDETH data packets.

This patch implements:
(1) a function to allocate hardware flow
(2) a function to free hardware flow
(3) a function to initialize hardware flow generation for a receiving
context
(4) a wait mechanism if the hardware flow is not available
(4) a function to remove the qp from the wait queue for hardware flow
when the qp is reset or destroyed.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a131d164 24-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add static trace for OPFN

This patch adds the static trace to the OPFN code and moves tid related
static trace code into a new header file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 48a615dc 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Integrate OPFN into RC transactions

OPFN parameter negotiation allows a pair of connected RC QPs to exchange
a set of parameters in succession. This negotiation does not commence
till the first ULP request. Because OPFN operations are operations
private to the driver, they do not generate user completions or put the
QP into error when they run out of retries. This patch integrates the
OPFN protocol into the transactions of an RC QP.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# d22a207d 23-Jan-2019 Kaike Wan <kaike.wan@intel.com>

IB/hfi1: Add OPFN helper functions for TID RDMA feature

This patch adds the OPFN helper functions to initialize, encode, decode,
and reset OPFN parameters for the TID RDMA feature.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# 5190f052 28-Nov-2018 Mike Marciniszyn <mike.marciniszyn@intel.com>

IB/hfi1: Allow the driver to initialize QP priv struct

This patch adds an interface to allow the driver to initialize the QP priv
struct when the QP is created and after the qpn has been assigned. A
field is added to the QP priv struct to reference the rcd and two new
files are added to contain the function to initialize the rcd field so
that more TID RDMA related code can be added here later.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>