History log of /linux-master/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h
Revision Date Author Comments
# 8a45c4f9 06-Oct-2023 Jie Wang <wangjie125@huawei.com>

net: hns3: add vf fault detect support

Currently hns3 driver supports vf fault detect feature. Several ras caused
by VF resources don't need to do PF function reset for recovery. The driver
only needs to reset the specified VF.

So this patch adds process in ras module. New process will get detailed
information about ras and do the most correct measures based on these
accurate information.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20231007031215.1067758-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# da3fea80 24-Oct-2021 Jiaran Zhang <zhangjiaran@huawei.com>

net: hns3: add error recovery module and type for himac

This patch adds himac error recovery module, link_error type and
ptp_error type for himac.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b566ef60 24-Oct-2021 Weihang Li <liweihang@huawei.com>

net: hns3: add new ras error type for roce

This patch adds one ras error of bus related for roce, this error
including RRESP/BRESP and read poison error.

Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 60484103 19-Oct-2021 Jiaran Zhang <zhangjiaran@huawei.com>

net: hns3: Add configuration of TM QCN error event

Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.

Fixes: d914971df022 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c360a4a 08-Jun-2021 Jiaran Zhang <zhangjiaran@huawei.com>

net: hns3: add error handling compatibility during initialization

During initialization, the driver logs and clears the hw errors that
already occurred. For device supports imp-handle ras capability, it
needs handle different error status, otherwise it may cause wrong reset.

So fix it by adding a new processing branch.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a95e360 08-Jun-2021 Jiaran Zhang <zhangjiaran@huawei.com>

net: hns3: update error recovery module and type

Update error recovery module and type for RoCE.

The enumeration values of module names and error types are not sorted
in sequence. If use the current printing mode, they cannot be correctly
printed.

Use the index mode, If mod_id and type_id match the enumerated value,
display the corresponding information.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e2deee7 08-Jun-2021 Jiaran Zhang <zhangjiaran@huawei.com>

net: hns3: add the RAS compatibility adaptation solution

To adapt to hardware modification and ensure that the driver is
compatible with the original error handling content, we need to add the
RAS compatibility adaptation solution.

Add a processing branch to the driver during error handling. In the new
processing branch, NIC fault information is integrated by the IMP. An
interaction command is added between the driver and IMP to query
and clear the fault source and interrupt source. The IMP integrates
error information and reports the highest reset level to the driver.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2867298d 29-Apr-2021 Yufeng Mo <moyufeng@huawei.com>

net: hns3: fix incorrect configuration for igu_egu_hw_err

According to the UM, the type and enable status of igu_egu_hw_err
should be configured separately. Currently, the type field is
incorrect when disable this error. So fix it by configuring these
two fields separately.

Fixes: bf1faf9415dd ("net: hns3: Add enable and process hw errors from IGU, EGU and NCSI")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5705b451 09-May-2020 Huazhong Tan <tanhuazhong@huawei.com>

net: hns3: remove a redundant register macro definition

HCLGE_MISC_VECTOR_INT_STS and HCLGE_VECTOR_PF_OTHER_INT_STS_REG
both represent the misc interrupt status register(0x20800), so
removes HCLGE_VECTOR_PF_OTHER_INT_STS_REG and replaces it with
HCLGE_MISC_VECTOR_INT_STS.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a83d2961 28-Aug-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: implement .process_hw_error for hns3 client

When hardware or IMP get specified error it may need the client
to take some special operations.

This patch implements the hns3 client's process_hw_errorx.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 987b4ae7 20-Jun-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: add check to number of buffer descriptors

This patch adds check to number of bds before we allocate memory for
them. If we get an invalid bd num in some cases, it will cause a memory
overflow.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f65e5ef 13-Jun-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: some changes of MSI-X bits in PPU(RCB)

This patch modifies print message of rx_q_search_miss from error to dfx to
prevent misleading users, because this interrupt may occur if we receive
packets during initialization of HNS3 driver.
Otherwise, this patch masks 28th bit of PPU_MPF_ABNORMAL_SRC2 which is now
meaningless.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e4193e24 13-Jun-2019 Shiju Jose <shiju.jose@huawei.com>

net: hns3: process H/W errors occurred before HNS dev initialization

Presently the HNS driver enables the HNS H/W error interrupts after
the dev initialization is completed. However some exceptions such as
NCSI errors can occur when the network port driver is not loaded
and those errors required reporting to the BMC.
Therefore the firmware enabled all the HNS ras error interrupts
before the driver is loaded. And in some cases, there will be some
H/W errors remained unclear before reboot. Thus the HNS driver needs
to process and recover those hw errors occurred before HNS driver is
initialized.

This patch adds processing of the HNS hw errors(RAS and MSI-X)
which occurred before the driver initialization. For RAS, because
they are enabled by firmware, so we can detect specific bits, then
log and clear them. But for MSI-X which can not be enabled before
open vector0 irq, we can't detect the specific error bits, so we
just write 1 to all interrupt source registers to clear.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0cd86182 06-Jun-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: trigger VF reset if a VF has an over_8bd_nfe_err

We trigger PF reset when a RAS error of NIC named over_8bd_nfe_err
occurred before. But it is possible that a VF causes that error, it's
reasonable to trigger VF reset instead of PF reset in this case.
This patch add detection of vf_id if a over_8bd_nfe_err occurs, if
vf_id is 0, we trigger PF reset. Otherwise, we will trigger VF reset
on the VF with error.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 238882c8 06-Jun-2019 Xiaofei Tan <tanxiaofei@huawei.com>

net: hns3: log detail error info of ROCEE ECC and AXI errors

This patch logs detail error info of ROCEE ECC and AXI errors for
debug purpose, and remove unnecessary reset for ROCEE overflow
errors.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00ea6e5f 02-Jun-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: delay and separate enabling of NIC and ROCE HW errors

All RAS and MSI-X should be enabled just in the final stage of HNS3
initialization. It means that they should be enabled in
hclge_init_xxx_client_instance instead of hclge_ae_dev(). Especially
MSI-X, if it is enabled before opening vector0 IRQ, there are some
chances that a MSI-X error will cause failure on initialization of
NIC client instane. So this patch delays enabling of HW errors.
Otherwise, we also separate enabling of ROCE RAS from NIC, because
it's not reasonable to enable ROCE RAS if we even don't have a ROCE
driver.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6aa5d07d 02-Jun-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: add handling of two bits in MAC tunnel interrupts

LINK_UP and LINK_DOWN are two bits of MAC tunnel interrupts, but previous
HNS3 driver didn't handle them. If they were enabled, value of these two
bits will change during link down and link up, which will cause HNS3
driver keep receiving IRQ but can't handle them.

This patch adds handling of these two bits of interrupts, we will record
and clear them as what we do to other MAC tunnel interrupts.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6345787 18-Apr-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: Add handling of MAC tunnel interruption

MAC tnl interruptions are different from other type of RAS and MSI-X
errors, because some bits, such as OVF/LR/RF will occur during link up
and down.

The drivers should clear status of all MAC tnl interruption bits but
shouldn't print any message that would mislead the users.

In case that link down and re-up in a short time because of some reasons,
we record when they occurred, and users can query them by debugfs.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c41e672d 13-Apr-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: set dividual reset level for all RAS and MSI-X errors

According to hardware description, reset level that should be
triggered are not consistent in a module. For example, in SSU
common errors, the first two bits has no need to do reset,
but the other bits need global reset.

This patch sets separate reset level for all RAS and MSI-X
interrupts by adding a reset_lvel field in struct hclge_hw_error,
and fixes some incorrect reset level.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d1f55d6b 19-Feb-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: enable 8~11th bit of mac common msi-x error

These bits are enabled now and have been test.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 747fc3f3 19-Feb-2019 Weihang Li <liweihang@hisilicon.com>

net: hns3: some bugfix of ppu(rcb) ras errors

The 3rd and 4th of PPU(RCB) PF Abnormal is RAS errors instead of MSI-X
like other bits. This patch adds process of handling and logging this
two bits. Otherwise, this patch modifies print message of 28th and 29th
bit of PPU MPF Abnormal errors, which keep same with other errors now.

Fixes: f69b10b317f9 ("net: hns3: handle hw errors of PPU(RCB)")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 630ba007 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: add handling of RDMA RAS errors

This patch handles the RDMA RAS errors.
1. Enable RAS interrupt, print error detail info and clear error status.
2. Do CORE reset to recovery when these non-fatal errors happened.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3529177 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: handle hw errors of SSU

This patch enables and handles hw errors of the Storage Switch Unit(SSU).

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f69b10b3 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: handle hw errors of PPU(RCB)

This patch enables and handles hw RAS and MSIx errors of PPU(RCB).

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7838f908 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: add handling of hw errors of MAC

This patch adds enable and handling of hw errors of
the MAC block.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6162d44 07-Dec-2018 Salil Mehta <salil.mehta@huawei.com>

net: hns3: add handling of hw errors reported through MSIX

This patch adds handling for HNS3 hardware errors(non-standard)
which are reported through MSIX interrupts and not through
PCIe AER channel.

These MSIX reported hardware errors are handled using common
misc. interrupt handler. Hardware error related registers
cannot be cleared in context to the interrupt received as
they require *heavy* access to hardware using IMP(Integrated
Mangement Processor) commands. Hence, we defer the clearing
of such error events till later time.

Since, we have defered exact identification of errors we
will have to defer the level of receovery/reset which
might be required. Hence, a new reset type UNKNOWN reset
has been introduced which effectively defers the assertion
of the reset till we get hold of kind of errors at later
time.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 332fbf57 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: add handling of hw ras errors using new set of commands

1. This patch adds handling of hw ras errors using new set of
common commands.
2. Updated the error message tables to match the register's name and
error status returned by the commands.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 381c356e 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: rename process_hw_error function

This patch renames process_hw_error function to
handle_hw_ras_error function to match the purpose
of the function. This is because hw errors reported through
ras and msix interrupts will be handled separately.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f3fa4a94 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: re-enable error interrupts on hw reset

This patch adds calling hclge_hw_error_set_state function
to re-enable the error interrupts those will be disabled on
the hw reset.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 98da4027 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: rename enable error interrupt functions

This patch
- renames the enable error interrupt functions.
The reason is that these functions
are used for both enable and disable error interrupts.

- removes redundant logs from the enable error interrupt functions.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe0f7d69 07-Dec-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: remove existing process error functions and reorder hw_blk table

1.The command interface for queryng and clearing hw errors is
changed, which requires the new process error functions to be added.
This patch removes all the current process error functions and
associated definitions. The new functions to handle ras errors
would be added in this patch set.

2. Fixed order issue of the hw_blk table.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01865a50 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add enable and process hw errors of TM scheduler

This patch enables and process hw errors of TM scheduler and
QCN(Quantized Congestion Control).

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da2d072a 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add enable and process hw errors from PPP

This patch enables and process hw errors from the
PPP(Programmable Packet Process) block.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf1faf94 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add enable and process hw errors from IGU, EGU and NCSI

This patch adds enable and processing of hw errors from IGU(Ingress Unit),
EGU(Egress Unit) and NCSI(Network Controller Sideband Interface).

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d67ee9a 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add enable and process common ecc errors

This patch adds enable and processing of ecc errors from
common HNS blocks, CMDQ(Command Queue),
IMP(Integrated Management Processor) and TQP(Task Queue Pair).

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 99714195 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add support to enable and disable hw errors

This patch adds functions to enable and disable hw errors.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a9f0eac 19-Oct-2018 Shiju Jose <shiju.jose@huawei.com>

net: hns3: Add PCIe AER callback error_detected

Set of hw errors occurred in the HNS3 are reported to the
hns3 driver through PCIe AER and RAS.The error info will be
processed and appropriately recovered.
This patch adds error_detected callback and error processing.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>