History log of /linux-master/drivers/crypto/intel/qat/qat_common/adf_aer.c
Revision Date Author Comments
# 7d42e097 09-Feb-2024 Damian Muszynski <damian.muszynski@intel.com>

crypto: qat - resolve race condition during AER recovery

During the PCI AER system's error recovery process, the kernel driver
may encounter a race condition with freeing the reset_data structure's
memory. If the device restart will take more than 10 seconds the function
scheduling that restart will exit due to a timeout, and the reset_data
structure will be freed. However, this data structure is used for
completion notification after the restart is completed, which leads
to a UAF bug.

This results in a KFENCE bug notice.

BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat]
Use-after-free read at 0x00000000bc56fddf (in kfence-#142):
adf_device_reset_worker+0x38/0xa0 [intel_qat]
process_one_work+0x173/0x340

To resolve this race condition, the memory associated to the container
of the work_struct is freed on the worker if the timeout expired,
otherwise on the function that schedules the worker.
The timeout detection can be done by checking if the caller is
still waiting for completion or not by using completion_done() function.

Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 9567d3dc 02-Feb-2024 Mun Chun Yep <mun.chun.yep@intel.com>

crypto: qat - improve aer error reset handling

Rework the AER reset and recovery flow to take into account root port
integrated devices that gets reset between the error detected and the
slot reset callbacks.

In adf_error_detected() the devices is gracefully shut down. The worker
threads are disabled, the error conditions are notified to listeners and
through PFVF comms and finally the device is reset as part of
adf_dev_down().

In adf_slot_reset(), the device is brought up again. If SRIOV VFs were
enabled before reset, these are re-enabled and VFs are notified of
restarting through PFVF comms.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# f5419a42 02-Feb-2024 Damian Muszynski <damian.muszynski@intel.com>

crypto: qat - add auto reset on error

Expose the `auto_reset` sysfs attribute to configure the driver to reset
the device when a fatal error is detected.

When auto reset is enabled, the driver resets the device when it detects
either an heartbeat failure or a fatal error through an interrupt.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 4469f9b2 02-Feb-2024 Mun Chun Yep <mun.chun.yep@intel.com>

crypto: qat - re-enable sriov after pf reset

When a Physical Function (PF) is reset, SR-IOV gets disabled, making the
associated Virtual Functions (VFs) unavailable. Even after reset and
using pci_restore_state, VFs remain uncreated because the numvfs still
at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs.

This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache
the SR-IOV enablement state. SR-IOV is only re-enabled if it was
previously configured.

This commit also introduces a dedicated workqueue without
`WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error
resets, preventing workqueue flushing warning.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# ec26f8e6 02-Feb-2024 Mun Chun Yep <mun.chun.yep@intel.com>

crypto: qat - update PFVF protocol for recovery

Update the PFVF logic to handle restart and recovery. This adds the
following functions:

* adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the
device detected a fatal error and requires a reset. This sends to
VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`.
* adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for
`ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs
before proceeding with a reset.
* adf_pf2vf_notify_restarted(): enables the PF to notify VFs with
an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that
the device is back to normal. This prompts VF drivers switch back to
use the accelerator for workload processing.

These changes improve the communication and synchronization between PF
and VF drivers during system restart and recovery processes.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 758a0087 02-Feb-2024 Furong Zhou <furong.zhou@intel.com>

crypto: qat - disable arbitration before reset

Disable arbitration to avoid new requests to be processed before
resetting a device.

This is needed so that new requests are not fetched when an error is
detected.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# ae508d7a 02-Feb-2024 Furong Zhou <furong.zhou@intel.com>

crypto: qat - add fatal error notify method

Add error notify method to report a fatal error event to all the
subsystems registered. In addition expose an API,
adf_notify_fatal_error(), that allows to trigger a fatal error
notification asynchronously in the context of a workqueue.

This will be invoked when a fatal error is detected by the ISR or
through Heartbeat.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 01aed663 09-Oct-2023 Svyatoslav Pankratov <svyatoslav.pankratov@intel.com>

crypto: qat - fix double free during reset

There is no need to free the reset_data structure if the recovery is
unsuccessful and the reset is synchronous. The function
adf_dev_aer_schedule_reset() handles the cleanup properly. Only
asynchronous resets require such structure to be freed inside the reset
worker.

Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Signed-off-by: Svyatoslav Pankratov <svyatoslav.pankratov@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# a4b16dad 28-Mar-2023 Tom Zanussi <tom.zanussi@linux.intel.com>

crypto: qat - Move driver to drivers/crypto/intel/qat

With the growing number of Intel crypto drivers, it makes sense to
group them all into a single drivers/crypto/intel/ directory.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>