#
cf0719a8 |
|
27-Sep-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: update debugfs-driver-habanalabs with the device-name directory The device debugfs directory was modified to be named as the parent device name. Update the paths accordingly. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
38ed55bc |
|
23-Mar-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: update debugfs-driver-habanalabs with the accel path Replace "/sys/kernel/debug/habanalabs/hl<n>/..." with "/sys/kernel/debug/accel/<n>/...". Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
ebab9426 |
|
14-Aug-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
Documentation/ABI: Fix typos Fix typos in Documentation/ABI. The changes are in descriptions or comments where they shouldn't affect use of the ABIs. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230814212822.193684-2-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
11669b58 |
|
30-Sep-2022 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: add an option to control watchdog timeout via debugfs Add an option to control the timeout value for the driver's watchdog of the reset process. The timeout represents the amount of the user has to close his process once he gets a device reset notification from the driver. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
08f0aa95 |
|
30-Jun-2022 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: expose only valid debugfs nodes In case security is enabled on the device, some debugfs nodes will fail. Hence, we do not expose them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
d7bb1ac8 |
|
26-Jun-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs: add gaudi2 asic-specific code Add the ASIC-specific code for Gaudi2. Supply (almost) all of the function callbacks that the driver's common code need to initialize, finalize and submit workloads to the Gaudi2 ASIC. It also contains the code to initialize the F/W of the Gaudi2 ASIC and to receive events from the F/W. It contains new debugfs entry to dump razwi events. razwi is a case where the device's engines create a transaction that reaches an invalid destination. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
70852c95 |
|
22-Jun-2022 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
habanalabs/gaudi: use memory_scrub_val from debugfs In the callback scrub_device_mem, use 'memory_scrub_val' from debugfs for the scrubbing value. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
0688474e |
|
11-Apr-2022 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
habanalabs: add device memory scrub ability through debugfs Add the ability to scrub the device memory with a given value. Add file 'dram_mem_scrub_val' to set the value and a file 'dram_mem_scrub' to scrub the dram. This is very important to help during automated tests, when you want the CI system to randomize the memory before training certain DL topologies. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d0b59cf6 |
|
22-Mar-2022 |
Ohad Sharabi <osharabi@habana.ai> |
habanalabs/gaudi: add debugfs to fetch internal sync status When Gaudi device is secured the monitors data in the configuration space is blocked from PCI access. As we need to enable user to get sync-manager monitors registers when debugging, this patch adds a debugfs that dumps the information to a binary file (blob). When a root user will trigger the dump, the driver will send request to the f/w to fill a data structure containing dump of all monitors registers. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d01e6cc9 |
|
12-Jan-2022 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: enable stop-on-error debugfs setting per ASIC On Goya and Gaudi, the stop-on-error configuration can be set via debugfs. However, in future devices, this configuration will always be enabled. Modify the debugfs node to be allowed only for ASICs that support this dynamic configuration. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
4edb4ffe |
|
05-Jan-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs/gaudi: disable CGM permanently Due to the need of SynapseAI to configure all TPC engines from a single QMAN, the driver must disable CGM and never allow the user to enable it. Otherwise, the configuration of the TPC engines will fail. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
a9ecddb9 |
|
14-Nov-2021 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: align debugfs documentation to alphabetical order Move an entry in the debugfs documentation to align with the alphabetical order which is kept this file. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
3eb7754f |
|
12-Oct-2021 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: debugfs support for larger I2C transactions I2C debugfs support is limited to 1 byte. We extend functionality to more than 1 byte by using one of the pad fields as a length. No backward compatibility issues as new F/W versions will treat 0 length as a 1 byte length transaction. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
4be9fb53 |
|
02-Sep-2021 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: add debugfs node for configuring CS timeout Command submission timeout is currently determined during driver loading time. As some environments requires this timeout to be modified in runtime, we introduce a new debugfs node that controls the timeout value without the need to reload the driver. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
89b21365 |
|
29-Jul-2021 |
Yuri Nudelman <ynudelman@habana.ai> |
habanalabs: add userptr_lookup node in debugfs It is useful to have the ability to see which user address was pinned to which physical address during the initial mapping. We already have all that info stored, but no means to search this data (which may be quite large). Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
938b793f |
|
06-Jun-2021 |
Yuri Nudelman <ynudelman@habana.ai> |
habanalabs: expose state dump To improve the user's ability to debug the case where a workload that is part of executing training/inference of a topology is getting stuck, we need to add a 'core dump' each time a CS times-out. The 'core dump' shall contain all relevant Sync Manager information and corresponding fence values. The most recent dumps shall be accessible via debugfs, under 'state_dump' node. Reading from the node will provide the oldest dump available. Writing an integer value X will discard X dumps, starting with the oldest one, i.e. subsequent read will now return newer dumps. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
4d041216 |
|
13-Jun-2021 |
Yuri Nudelman <ynudelman@habana.ai> |
debugfs: add skip_reset_on_timeout option To be able to debug long-running CS better, without changing the userspace code, we are adding a new option through debugfs interface to skip the reset of the device in case of CS timeout. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
3e42d1de |
|
13-May-2021 |
Carlos Bilbao <bilbao@vt.edu> |
docs: typo fixes in Documentation/ABI/ Fix the following typos in the Documentation/ABI/ directory: - In file obsolete/sysfs-cpuidle, change "obselete" for "obsolete". - In file removed/sysfs-kernel-uids, change "propotional" for "proportional". - In directory stable/, fix the following words: "associtated" for "associated", "hexidecimal" for "hexadecimal", "vlue" for "value", "csed" for "caused" and "wrtie" for "write". This updates a total of five files. - In directory testing/, fix the following words: "subystem" for "subsystem", "isochrnous" for "isochronous", "Desctiptors" for "Descriptors", "picutre" for "picture", "capture" for "capture", "occured" for "ocurred", "connnected" for "connected","agressively" for "aggressively","manufacturee" for "manufacturer" and "transaction" for "transaction", "malformatted" for "incorrectly formated" ,"internel" for "internal", "writtento" for "written to", "specificed" for "specified", "beyound" for "beyond", "Symetric" for "Symmetric". This updates a total of eleven files. Signed-off-by: Carlos Bilbao <bilbao@vt.edu> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/5710038.lOV4Wx5bFT@iron-maiden Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
639781dc |
|
01-Apr-2021 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs/gaudi: add debugfs to DMA from the device When trying to debug program, the user often needs to dump large parts of the device's DRAM, which can reach to tens of GBs. Because reading from the device's internal memory through the PCI BAR is extremely slow, the debug can take hours. Instead, we can provide the user to copy data through one of the DMA engines. This will make the operation much faster. Currently, only GAUDI is supported. In GAUDI, we need to find a PCI DMA engine that is IDLE and set the DMA as secured to be able to bypass our MMU as we currently don't map the temporary buffer to the MMU. Example bash one-line to dump entire HBM to file (~2 minutes): for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \ printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \ echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \ sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
a4371c1a |
|
23-Feb-2021 |
Sagiv Ozeri <sozeri@habana.ai> |
habanalabs: support HW blocks vm show Improve "vm" debugfs node to print also the virtual addresses which are currently mapped to HW blocks in the device. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
2f8db5a1 |
|
12-Jan-2021 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs: update email address in sysfs/debugfs docs Use my kernel.org address for contact point instead of my private email address. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
d2b980f3 |
|
06-Jan-2021 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: add security violations dump to debugfs In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
#
54a19b4d |
|
30-Oct-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: ABI: cleanup several ABI documents There are some ABI documents that, while they don't generate any warnings, they have issues when parsed by get_abi.pl script on its output result. Address them, in order to provide a clean output. Reviewed-by: Tom Rix <trix@redhat.com> # for fpga-manager Reviewed-By: Kajol Jain<kjain@linux.ibm.com> # for sysfs-bus-event_source-devices-hv_gpci and sysfs-bus-event_source-devices-hv_24x7 Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #for IIO Acked-by: Oded Gabbay <oded.gabbay@gmail.com> # for Habanalabs Acked-by: Vaibhav Jain <vaibhav@linux.ibm.com> # for sysfs-bus-papr-pmem Acked-by: Cezary Rojewski <cezary.rojewski@intel.com> # for catpt Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Ilya Dryomov <idryomov@gmail.com> # for rbd Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/5bc78e5b68ed1e9e39135173857cb2e753be868f.1604042072.git.mchehab+huawei@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
e38bfd30 |
|
03-Jul-2020 |
Oded Gabbay <oded.gabbay@gmail.com> |
habanalabs: set clock gating per engine For debugging purposes, we need to allow the root user better control of the clock gating feature of the DMA and compute engines. Therefore, change the clock gating debugfs interface to be bitmask instead of true/false. Each bit represents a different engine, according to gaudi_engine_id enum. See debugfs documentation for more details. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
#
ca62433f |
|
08-May-2020 |
Oded Gabbay <oded.gabbay@gmail.com> |
habanalabs: support clock gating enable/disable In Gaudi there is a feature of clock gating certain engines. Therefore, add this property to the device structure. In addition, due to a limitation of this feature, the driver needs to dynamically enable or disable this feature during run-time. Therefore, add ASIC interface functions to enable/disable this function from the common code. Moreover, this feature must be turned off when the user wishes to debug the ASIC by reading/writing registers and/or memory through the driver's debugfs. Therefore, add an option to enable/disable clock gating via the debugfs interface. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
76cedc73 |
|
22-Mar-2020 |
Omer Shpigelman <oshpigelman@habana.ai> |
habanalabs: remove stop-on-error flag from DMA Stop-on-error mode in DMA is useful as it stops the transaction immediately upon error e.g. page fault. But it may cause the next command submission to fail as is leaves the DMA in unstable state. Therefore we remove the stop-on-error configuration from the DMA. Stop-on-err is still available for debug. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
5cce5146 |
|
12-Nov-2019 |
Moti Haimovski <mhaimovski@habana.ai> |
habanalabs: add debugfs write64/read64 Allow debug user to write/read 64-bit data through debugfs. This will expedite the dump process of the (large) internal memories of the device done during debug. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
06deb86a |
|
01-Jul-2019 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: Add debugfs node for engines status Command submissions sent to the device are composed of command buffers which are targeted to different device engines, like DMA and compute entities. When a command submission gets stuck, knowing in which engine the stuck is, is crucial for debugging. This patch adds a debugfs node that exports this information, by displaying the engines' various registers that assemble their idle/busy status. The information retrieval is based on the is_device_idle ASIC function. The printout in this function, of the first detected busy engine, is removed because it becomes redundant in the presence of the more elaborated info of the new debugfs node. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
4a0ce776 |
|
16-Jun-2019 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: Allow accessing host mapped addresses via debugfs Allows using the addr/data32 debugfs nodes to access a device VA of a host mapped memory when the IOMMU is disabled. Due to the possible large amount of a user host mapped memory, the driver doesn't maintain a database with the host addresses per device VA. When the IOMMU is disabled, this missing info is being overcome by simply using phys_to_virt(). However, this is not useful when the IOMMU is enabled, and thus the enforced limitation. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
c2164773 |
|
15-Feb-2019 |
Oded Gabbay <oded.gabbay@gmail.com> |
habanalabs: add debugfs support This patch adds debugfs support to the driver. It allows the user-space to display information that is contained in the internal structures of the driver, such as: - active command submissions - active user virtual memory mappings - number of allocated command buffers It also enables the user to perform reads and writes through Goya's PCI bars. Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|