History log of /linux-master/Documentation/ABI/testing/debugfs-driver-habanalabs
Revision Date Author Comments
# cf0719a8 27-Sep-2023 Tomer Tayar <ttayar@habana.ai>

accel/habanalabs: update debugfs-driver-habanalabs with the device-name directory

The device debugfs directory was modified to be named as the
parent device name.
Update the paths accordingly.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 38ed55bc 23-Mar-2023 Tomer Tayar <ttayar@habana.ai>

accel/habanalabs: update debugfs-driver-habanalabs with the accel path

Replace "/sys/kernel/debug/habanalabs/hl<n>/..." with
"/sys/kernel/debug/accel/<n>/...".

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# ebab9426 14-Aug-2023 Bjorn Helgaas <bhelgaas@google.com>

Documentation/ABI: Fix typos

Fix typos in Documentation/ABI. The changes are in descriptions or
comments where they shouldn't affect use of the ABIs.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230814212822.193684-2-helgaas@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 11669b58 30-Sep-2022 Tomer Tayar <ttayar@habana.ai>

habanalabs: add an option to control watchdog timeout via debugfs

Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 08f0aa95 30-Jun-2022 Ofir Bitton <obitton@habana.ai>

habanalabs: expose only valid debugfs nodes

In case security is enabled on the device, some debugfs nodes will
fail. Hence, we do not expose them.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# d7bb1ac8 26-Jun-2022 Oded Gabbay <ogabbay@kernel.org>

habanalabs: add gaudi2 asic-specific code

Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.

It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.

It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 70852c95 22-Jun-2022 Dafna Hirschfeld <dhirschfeld@habana.ai>

habanalabs/gaudi: use memory_scrub_val from debugfs

In the callback scrub_device_mem, use 'memory_scrub_val'
from debugfs for the scrubbing value.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 0688474e 11-Apr-2022 Dafna Hirschfeld <dhirschfeld@habana.ai>

habanalabs: add device memory scrub ability through debugfs

Add the ability to scrub the device memory with a given value.
Add file 'dram_mem_scrub_val' to set the value
and a file 'dram_mem_scrub' to scrub the dram.

This is very important to help during automated tests, when you want
the CI system to randomize the memory before training certain
DL topologies.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d0b59cf6 22-Mar-2022 Ohad Sharabi <osharabi@habana.ai>

habanalabs/gaudi: add debugfs to fetch internal sync status

When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d01e6cc9 12-Jan-2022 Tomer Tayar <ttayar@habana.ai>

habanalabs: enable stop-on-error debugfs setting per ASIC

On Goya and Gaudi, the stop-on-error configuration can be set via
debugfs. However, in future devices, this configuration will always be
enabled.
Modify the debugfs node to be allowed only for ASICs that support this
dynamic configuration.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 4edb4ffe 05-Jan-2022 Oded Gabbay <ogabbay@kernel.org>

habanalabs/gaudi: disable CGM permanently

Due to the need of SynapseAI to configure all TPC engines from a single
QMAN, the driver must disable CGM and never allow the user to enable
it. Otherwise, the configuration of the TPC engines will fail.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# a9ecddb9 14-Nov-2021 Tomer Tayar <ttayar@habana.ai>

habanalabs: align debugfs documentation to alphabetical order

Move an entry in the debugfs documentation to align with the
alphabetical order which is kept this file.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 3eb7754f 12-Oct-2021 Ofir Bitton <obitton@habana.ai>

habanalabs: debugfs support for larger I2C transactions

I2C debugfs support is limited to 1 byte. We extend functionality
to more than 1 byte by using one of the pad fields as a length.
No backward compatibility issues as new F/W versions will treat 0
length as a 1 byte length transaction.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 4be9fb53 02-Sep-2021 Ofir Bitton <obitton@habana.ai>

habanalabs: add debugfs node for configuring CS timeout

Command submission timeout is currently determined during driver
loading time. As some environments requires this timeout to be
modified in runtime, we introduce a new debugfs node that controls
the timeout value without the need to reload the driver.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 89b21365 29-Jul-2021 Yuri Nudelman <ynudelman@habana.ai>

habanalabs: add userptr_lookup node in debugfs

It is useful to have the ability to see which user address was pinned
to which physical address during the initial mapping. We already have
all that info stored, but no means to search this data (which may be
quite large).

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 938b793f 06-Jun-2021 Yuri Nudelman <ynudelman@habana.ai>

habanalabs: expose state dump

To improve the user's ability to debug the case where a workload that
is part of executing training/inference of a topology is getting stuck,
we need to add a 'core dump' each time a CS times-out. The 'core dump'
shall contain all relevant Sync Manager information and corresponding
fence values.

The most recent dumps shall be accessible via debugfs, under
'state_dump' node. Reading from the node will provide the oldest dump
available. Writing an integer value X will discard X dumps, starting
with the oldest one, i.e. subsequent read will now return newer
dumps.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 4d041216 13-Jun-2021 Yuri Nudelman <ynudelman@habana.ai>

debugfs: add skip_reset_on_timeout option

To be able to debug long-running CS better, without changing the
userspace code, we are adding a new option through debugfs interface
to skip the reset of the device in case of CS timeout.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 3e42d1de 13-May-2021 Carlos Bilbao <bilbao@vt.edu>

docs: typo fixes in Documentation/ABI/

Fix the following typos in the Documentation/ABI/ directory:

- In file obsolete/sysfs-cpuidle, change "obselete" for "obsolete".

- In file removed/sysfs-kernel-uids, change "propotional" for "proportional".

- In directory stable/, fix the following words: "associtated" for "associated",
"hexidecimal" for "hexadecimal", "vlue" for "value", "csed" for "caused" and
"wrtie" for "write". This updates a total of five files.

- In directory testing/, fix the following words: "subystem" for "subsystem",
"isochrnous" for "isochronous", "Desctiptors" for "Descriptors", "picutre" for
"picture", "capture" for "capture", "occured" for "ocurred", "connnected" for
"connected","agressively" for "aggressively","manufacturee" for "manufacturer"
and "transaction" for "transaction", "malformatted" for "incorrectly formated"
,"internel" for "internal", "writtento" for "written to", "specificed" for
"specified", "beyound" for "beyond", "Symetric" for "Symmetric". This updates
a total of eleven files.

Signed-off-by: Carlos Bilbao <bilbao@vt.edu>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5710038.lOV4Wx5bFT@iron-maiden
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 639781dc 01-Apr-2021 Oded Gabbay <ogabbay@kernel.org>

habanalabs/gaudi: add debugfs to DMA from the device

When trying to debug program, the user often needs to
dump large parts of the device's DRAM, which can reach to tens of GBs.
Because reading from the device's internal memory through the PCI BAR
is extremely slow, the debug can take hours.

Instead, we can provide the user to copy data through one of the DMA
engines. This will make the operation much faster.

Currently, only GAUDI is supported.

In GAUDI, we need to find a PCI DMA engine that is IDLE and set the
DMA as secured to be able to bypass our MMU as we currently don't
map the temporary buffer to the MMU.

Example bash one-line to dump entire HBM to file (~2 minutes):

for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \
printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \
echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \
sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# a4371c1a 23-Feb-2021 Sagiv Ozeri <sozeri@habana.ai>

habanalabs: support HW blocks vm show

Improve "vm" debugfs node to print also the virtual addresses which are
currently mapped to HW blocks in the device.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 2f8db5a1 12-Jan-2021 Oded Gabbay <ogabbay@kernel.org>

habanalabs: update email address in sysfs/debugfs docs

Use my kernel.org address for contact point instead of my private email
address.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# d2b980f3 06-Jan-2021 Ofir Bitton <obitton@habana.ai>

habanalabs: add security violations dump to debugfs

In order to improve driver security debuggability, we add
security violations dump to debugfs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>


# 54a19b4d 30-Oct-2020 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

docs: ABI: cleanup several ABI documents

There are some ABI documents that, while they don't generate
any warnings, they have issues when parsed by get_abi.pl script
on its output result.

Address them, in order to provide a clean output.

Reviewed-by: Tom Rix <trix@redhat.com> # for fpga-manager
Reviewed-By: Kajol Jain<kjain@linux.ibm.com> # for sysfs-bus-event_source-devices-hv_gpci and sysfs-bus-event_source-devices-hv_24x7
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #for IIO
Acked-by: Oded Gabbay <oded.gabbay@gmail.com> # for Habanalabs
Acked-by: Vaibhav Jain <vaibhav@linux.ibm.com> # for sysfs-bus-papr-pmem
Acked-by: Cezary Rojewski <cezary.rojewski@intel.com> # for catpt
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Ilya Dryomov <idryomov@gmail.com> # for rbd
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5bc78e5b68ed1e9e39135173857cb2e753be868f.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e38bfd30 03-Jul-2020 Oded Gabbay <oded.gabbay@gmail.com>

habanalabs: set clock gating per engine

For debugging purposes, we need to allow the root user better control of
the clock gating feature of the DMA and compute engines. Therefore, change
the clock gating debugfs interface to be bitmask instead of true/false.
Each bit represents a different engine, according to gaudi_engine_id enum.

See debugfs documentation for more details.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>


# ca62433f 08-May-2020 Oded Gabbay <oded.gabbay@gmail.com>

habanalabs: support clock gating enable/disable

In Gaudi there is a feature of clock gating certain engines.
Therefore, add this property to the device structure.

In addition, due to a limitation of this feature, the driver needs to
dynamically enable or disable this feature during run-time. Therefore, add
ASIC interface functions to enable/disable this function from the common
code.

Moreover, this feature must be turned off when the user wishes to debug the
ASIC by reading/writing registers and/or memory through the driver's
debugfs. Therefore, add an option to enable/disable clock gating via the
debugfs interface.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>


# 76cedc73 22-Mar-2020 Omer Shpigelman <oshpigelman@habana.ai>

habanalabs: remove stop-on-error flag from DMA

Stop-on-error mode in DMA is useful as it stops the transaction
immediately upon error e.g. page fault.
But it may cause the next command submission to fail as is leaves the DMA
in unstable state.
Therefore we remove the stop-on-error configuration from the DMA.
Stop-on-err is still available for debug.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>


# 5cce5146 12-Nov-2019 Moti Haimovski <mhaimovski@habana.ai>

habanalabs: add debugfs write64/read64

Allow debug user to write/read 64-bit data through debugfs.
This will expedite the dump process of the (large) internal
memories of the device done during debug.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>


# 06deb86a 01-Jul-2019 Tomer Tayar <ttayar@habana.ai>

habanalabs: Add debugfs node for engines status

Command submissions sent to the device are composed of command buffers
which are targeted to different device engines, like DMA and compute
entities. When a command submission gets stuck, knowing in which engine
the stuck is, is crucial for debugging.
This patch adds a debugfs node that exports this information, by
displaying the engines' various registers that assemble their idle/busy
status.
The information retrieval is based on the is_device_idle ASIC function.
The printout in this function, of the first detected busy engine, is
removed because it becomes redundant in the presence of the more
elaborated info of the new debugfs node.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>


# 4a0ce776 16-Jun-2019 Tomer Tayar <ttayar@habana.ai>

habanalabs: Allow accessing host mapped addresses via debugfs

Allows using the addr/data32 debugfs nodes to access a device VA of a
host mapped memory when the IOMMU is disabled.

Due to the possible large amount of a user host mapped memory, the
driver doesn't maintain a database with the host addresses per device VA.
When the IOMMU is disabled, this missing info is being overcome by
simply using phys_to_virt(). However, this is not useful when the IOMMU
is enabled, and thus the enforced limitation.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>


# c2164773 15-Feb-2019 Oded Gabbay <oded.gabbay@gmail.com>

habanalabs: add debugfs support

This patch adds debugfs support to the driver. It allows the user-space to
display information that is contained in the internal structures of the
driver, such as:
- active command submissions
- active user virtual memory mappings
- number of allocated command buffers

It also enables the user to perform reads and writes through Goya's PCI
bars.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>