#
bf729988 |
|
11-Apr-2024 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Restore mistakenly dropped parts in register devlink flow Code parts from cited commit were mistakenly dropped while rebasing before submission. Add them here. Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240411115444.374475-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aa4ac90d |
|
11-Apr-2024 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5: SD, Handle possible devcom ERR_PTR Check if devcom holds an error pointer and return immediately. This fixes Smatch static checker warning: drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register() error: 'devcom' dereferencing possible ERR_PTR() Enhance mlx5_devcom_register_component() so it stops returning NULL, making it easier for its callers. Fixes: d3d057666090 ("net/mlx5: SD, Implement devcom communication and primary election") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c6e77aa9 |
|
09-Apr-2024 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Register devlink first under devlink lock In case device is having a non fatal FW error during probe, the driver will report the error to user via devlink. This will trigger a WARN_ON, since mlx5 is calling devlink_register() last. In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register() first under devlink lock. [1] WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0 CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core] RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0 Call Trace: <TASK> ? __warn+0x79/0x120 ? devlink_recover_notify.constprop.0+0xb8/0xc0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? devlink_recover_notify.constprop.0+0xb8/0xc0 devlink_health_report+0x4a/0x1c0 mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core] process_one_work+0x1bb/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x4d/0x3c0 ? process_one_work+0x3c0/0x3c0 kthread+0xc6/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bcad0e53 |
|
26-Jan-2024 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Return specific error code for timeout on wait_fw_init The function wait_fw_init() returns same error code either if it breaks waiting due to timeout or other reason. Thus, the function callers print error message on timeout without checking error type. Return different error code for different failure reason and print error message accordingly on wait_fw_init(). Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
10b49d0e |
|
11-Oct-2023 |
Justin Stitt <justinstitt@google.com> |
net/mlx5: simplify mlx5_set_driver_version string assignments In total, just assigning this version string takes: (1) strncpy()'s (5) strlen()'s (3) strncat()'s (1) snprintf()'s (4) max_t()'s Moreover, `strncpy` is deprecated [1] and `strncat` really shouldn't be used either [2]. With this in mind, let's simply use a single `snprintf`. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://elixir.bootlin.com/linux/v6.6-rc5/source/include/linux/fortify-string.h#L448 [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e534552c |
|
12-Oct-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom LAG peer device lookout bus logic required the usage of global lock, mlx5_intf_mutex. As part of the effort to remove this global lock, refactor LAG peer device lookout to use mlx5 devcom layer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0d293714 |
|
21-Sep-2023 |
Patrisious Haddad <phaddad@nvidia.com> |
RDMA/mlx5: Send events from IB driver about device affiliation state Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
85b47dc4 |
|
13-Sep-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Disable eswitch as the first thing in mlx5_unload() The eswitch disable call does removal of all representors. Do that before clearing the SF device table and maintain the same flow as during SF devlink port removal, where the representor is removed before the actual SF is removed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a41cb591 |
|
11-Jul-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Remove unused MAX HCA capabilities Each device cap has two modes: MAX and CUR. The driver maintains a cache of both modes of the capabilities. For most device caps, the MAX cap mode is never used. Hence, remove all driver queries of the MAX mode of the said caps as well as their helper MACROs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0b4eb603 |
|
02-Jan-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Remove unused CAPs mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1f507e80 |
|
07-Aug-2023 |
Adham Faris <afaris@nvidia.com> |
net/mlx5: Expose NIC temperature via hardware monitoring kernel API Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare <jdelvare@suse.com> Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
06cd555f |
|
18-Jan-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: split mlx5_cmd_init() to probe and reload routines There is no need to destroy and allocate cmd SW structs during reload, this is time consuming for no reason. Hence, split mlx5_cmd_init() to probe and reload routines. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
88d162b4 |
|
03-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Devcom, Infrastructure changes Update devcom infrastructure to be more generic, without depending on max supported ports definition or a device guid, and also more encapsulated so callers don't need to pass the register devcom component id per event call. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
aab8e1a2 |
|
23-Jul-2023 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Reload auxiliary devices in pci error handlers Handling pci errors should fully teardown and load back auxiliary devices, same as done through mlx5 health recovery flow. Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
53d737df |
|
25-Jun-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Unregister devlink params in case interface is down Currently, in case an interface is down, mlx5 driver doesn't unregister its devlink params, which leads to this WARN[1]. Fix it by unregistering devlink params in that case as well. [1] [ 295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc [ 295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61 [ 295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun 6 2023 [ 295.543096 ] pc : devlink_free+0x174/0x1fc [ 295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.561816 ] sp : ffff80000809b850 [ 295.711155 ] Call trace: [ 295.716030 ] devlink_free+0x174/0x1fc [ 295.723346 ] mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.733351 ] mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core] [ 295.743534 ] auxiliary_bus_remove+0x2c/0x50 [ 295.751893 ] __device_release_driver+0x19c/0x280 [ 295.761120 ] device_release_driver+0x34/0x50 [ 295.769649 ] bus_remove_device+0xdc/0x170 [ 295.777656 ] device_del+0x17c/0x3a4 [ 295.784620 ] mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core] [ 295.794800 ] mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core] [ 295.806375 ] mlx5_unload+0x34/0xd0 [mlx5_core] [ 295.815339 ] mlx5_unload_one+0x70/0xe4 [mlx5_core] [ 295.824998 ] shutdown+0xb0/0xd8 [mlx5_core] [ 295.833439 ] pci_device_shutdown+0x3c/0xa0 [ 295.841651 ] device_shutdown+0x170/0x340 [ 295.849486 ] __do_sys_reboot+0x1f4/0x2a0 [ 295.857322 ] __arm64_sys_reboot+0x2c/0x40 [ 295.865329 ] invoke_syscall+0x78/0x100 [ 295.872817 ] el0_svc_common.constprop.0+0x54/0x184 [ 295.882392 ] do_el0_svc+0x30/0xac [ 295.889008 ] el0_svc+0x48/0x160 [ 295.895278 ] el0t_64_sync_handler+0xa4/0x130 [ 295.903807 ] el0t_64_sync+0x1a4/0x1a8 [ 295.911120 ] ---[ end trace 4f1d2381d00d9dce ]--- Fixes: fe578cbb2f05 ("net/mlx5: Move devlink registration before mlx5_load") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7a9770f1 |
|
17-May-2023 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Handle sync reset unload event Added a new event handler to firmware sync reset, which is used to support firmware sync reset flow on smart NIC. Adding this new stage to the flow enables the firmware to ensure host PFs unload before ECPFs unload, to avoid race of PFs recovery. If firmware sends sync_reset_unload event to driver the driver should unload and close all HW resources of the function. Once the driver finishes unloading part, it can't get any more events from firmware as event queues are closed, so it polls the reset state field to know when to continue to next stage of the sync reset flow. Added capability bit for supporting sync_reset_unload event. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e71383fb |
|
03-May-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Light probe local SFs In case user wants to configure the SFs, for example: to use only vdpa functionality, he needs to fully probe a SF, configure what he wants, and afterward reload the SF. In order to save the time of the reload, local SFs will probe without any auxiliary sub-device, so that the SFs can be configured prior to its full probe. The defaults of the enable_* devlink params of these SFs are set to false. Usage example: Create SF: $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 $ devlink port function set pci/0000:08:00.0/32768 \ hw_addr 00:00:00:00:00:11 state active Enable ETH auxiliary device: $ devlink dev param set auxiliary/mlx5_core.sf.1 \ name enable_eth value true cmode driverinit Now, in order to fully probe the SF, use devlink reload: $ devlink dev reload auxiliary/mlx5_core.sf.1 At this point the user have SF devlink instance with auxiliary device for the Ethernet functionality only. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2059cf51 |
|
02-May-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Split function_setup() to enable and open functions mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who desire to disable some of the SF aux devices, and with large scale-1K SFs for example, this user will waste more than 3 minutes on mlx5_cmd_init_hca() which isn't needed at this stage. Downstream patch will change SFs which are probe over the E-switch, local SFs, to be probed without any aux dev. In order to support this, split function_setup() to avoid executing mlx5_cmd_init_hca(). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6d98f314 |
|
07-Mar-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Update SRIOV enable/disable to handle EC/VFs Previously on the embedded CPU platform SRIOV was never enabled/disabled via mlx5_core_sriov_configure. Host VF updates are provided by an event handler. Now in the disable flow it must be known if this is a disable due to driver unload or SRIOV detach, or if the user updated the number of VFs. If due to change in the number of VFs only wait for the pages of ECVFs. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bbfa4b58 |
|
28-Apr-2023 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Read embedded cpu after init bit cleared During driver load it reads embedded_cpu bit from initialization segment, but the initialization segment is readable only after initialization bit is cleared. Move the call to mlx5_read_embedded_cpu() right after initialization bit cleared. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Fixes: 591905ba9679 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic") Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
824c8dc4 |
|
23-Apr-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Drain health before unregistering devlink mlx5 health mechanism is using devlink APIs, which are using devlink notify APIs. After the cited patch, using devlink notify APIs after devlink is unregistered triggers a WARN_ON(). Hence, drain health WQ before devlink is unregistered. Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a6573514 |
|
01-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Fix error message when failing to allocate device memory Fix spacing for the error and also the correct error code pointer. Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f4244e55 |
|
19-Mar-2023 |
Or Har-Toov <ohartoov@nvidia.com> |
net/mlx5: Set out of order (ooo) by default When FW supports ooo by default, enable the cap. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/00bd14bfb002ed2338de3296bcd9af27d4770b70.1679230449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
21608a2c |
|
09-Apr-2023 |
Moshe Shemesh <moshe@nvidia.com> |
Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function" This reverts commit 5977ac3910f1cbaf44dca48179118b25c206ac29. Revert this patch as we need the "recovery" arg back in mlx5_load_one() function. This arg will be used in the next patch for using recovery timeout during sync reset flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f9c895a7 |
|
22-Mar-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Update op_mode to op_mod for port selection To be consistent with the other enum keys use OP_MOD instead of OP_MODE. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9df839a7 |
|
23-Apr-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Create a new profile for SFs Create a new profile for SFs in order to disable the command cache. Each function command cache consumes ~500KB of memory, when using a large number of SFs this savings is notable on memory constarined systems. Use a new profile to provide for future differences between SFs and PFs. The mr_cache not used for non-PF functions, so it is excluded from the new profile. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fe578cbb |
|
02-Jan-2023 |
Eli Cohen <elic@nvidia.com> |
net/mlx5: Move devlink registration before mlx5_load In order to allow reference to devlink parameters during driver load, move the devlink registration before mlx5_load. Subsequent patch will use it to control the number of completion vectors required based on whether eth is enabled or not. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
|
#
3354822c |
|
31-Dec-2022 |
Eli Cohen <elic@nvidia.com> |
net/mlx5: Use dynamic msix vectors allocation Current implementation calculates the number and the partitioaning of available interrupts vectors and then allocates all the interrupt vectors. Here, whenever dynamic msix allocation is supported, we change this to use msix vectors dynamically so a vectors is actually allocated only when needed. The current pool logic is kept in place to take care of partitioning the vectors between the consumers and take care of reference counting. However, the vectors are allocated only when needed. Subsequent patches will make use of this to allocate vectors for VDPA. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
|
#
5b6f4bd2 |
|
23-Mar-2023 |
Cai Huoqing <cai.huoqing@linux.dev> |
net/mlx5: Remove redundant pci_clear_master Remove pci_clear_master to simplify the code, the bus-mastering is also cleared in do_pci_disable_device, like this: ./drivers/pci/pci.c:2197 static void do_pci_disable_device(struct pci_dev *dev) { u16 pci_command; pci_read_config_word(dev, PCI_COMMAND, &pci_command); if (pci_command & PCI_COMMAND_MASTER) { pci_command &= ~PCI_COMMAND_MASTER; pci_write_config_word(dev, PCI_COMMAND, pci_command); } pcibios_disable_device(dev); }. And dev->is_busmaster is set to 0 in pci_disable_device. Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1fef618 |
|
13-Mar-2023 |
Sandipan Patra <spatra@nvidia.com> |
net/mlx5: Implement thermal zone Implement thermal zone support for mlx5 based HW. The NIC uses temperature sensor provided by ASIC to report current temperature to thermal core. Signed-off-by: Sandipan Patra <spatra@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c05d145a |
|
13-Mar-2023 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: remove redundant clear_bit When shutdown or remove callbacks are called the driver sets the flag MLX5_BREAK_FW_WAIT, to stop waiting for FW as teardown was called. There is no need to clear the bit as once shutdown or remove were called as there is no way back, the driver is going down. Furthermore, if not cleared the flag can be used also in other loops where we may wait while teardown was already called. Use test_bit() instead of test_and_clear_bit() as there is no need to clear the flag. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
031a163f |
|
28-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Set BREAK_FW_WAIT flag first when removing driver Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset. However, fw_reset can call mlx5_load_one() which is waiting for fw init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver might wait on a loop it should exit. Fix it by setting the flag before syncing with fw_reset. Fixes: 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7ba930fc |
|
19-Oct-2022 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Disable eswitch before waiting for VF pages The offending commit changed the ordering of moving to legacy mode and waiting for the VF pages. Moving to legacy mode is important in bluefield, because it sends the host driver into error state, and frees its pages. Without this transition we end up waiting 2 minutes for pages that aren't coming before carrying on with the unload process. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
72ed5d56 |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Suspend auxiliary devices only in case of PCI device suspend The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add support for devlink-port in non-representors mode") correctly re-instantiated uplink devlink port and related netdevice during devlink reload. However with migration to auxiliary devices, this behaviour changed. Restore the original behaviour and tear down auxiliary devices completely during devlink reload. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5977ac39 |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Remove "recovery" arg from mlx5_load_one() function mlx5_load_one() is always called with recovery==false, so remove the unneeded function arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c7d4e6ab |
|
01-Nov-2022 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5e: Propagate an internal event in case uplink netdev changes Whenever uplink netdev is set/cleared, propagate newly introduced event to inform notifier blocks netdev was added/removed. Move the set() helper to core.c from header, introduce clear() and netdev_added_event_replay() helpers. The last one is going to be called from rdma driver, so export it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fe298bdf |
|
22-May-2022 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5: Prepare for fast crypto key update if hardware supports it Add CAP for crypto offload, do the simple initialization if hardware supports it. Currently set log_dek_obj_range to 12, so 4k DEKs will be created in one bulk allocation. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
075935f0 |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
devlink: protect devlink param list by instance lock Commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance") as the subject implies introduced possibility to register devlink params even for already registered devlink instance. This is a bit problematic, as the consistency or params list was originally secured by the fact it is static during devlink lifetime. So in order to protect the params list, take devlink instance lock during the params operations. Introduce unlocked function variants and use them in drivers in locked context. Put lock assertions to appropriate places. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8aebff4 |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Change devlink param register/unregister function names The functions are registering and unregistering devlink params, so change the names accordingly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f0d1451 |
|
14-Dec-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Serialize module cleanup with reload and remove Currently, remove and reload flows can run in parallel to module cleanup. This design is error prone. For example: aux_drivers callbacks are called from both cleanup and remove flows with different lockings, which can cause a deadlock[1]. Hence, serialize module cleanup with reload and remove. [1] cleanup remove ------- ------ auxiliary_driver_unregister(); devl_lock() auxiliary_device_delete(mlx5e_aux) device_lock(mlx5e_aux) devl_lock() device_lock(mlx5e_aux) Fixes: 912cebf420c2 ("net/mlx5e: Connect ethernet part to auxiliary bus") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2c1e1b94 |
|
30-Aug-2022 |
Randy Dunlap <rdunlap@infradead.org> |
net: mlx5: eliminate anonymous module_init & module_exit Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log. Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names. Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Eli Cohen <eli@mellanox.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c4ad5f2b |
|
09-Nov-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Fix RoCE setting at HCA level mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is marked as unsupported on those VFs/SFs. The cited patch added an option for disable (and enable) RoCE at HCA level. However, that commit didn't check whether RoCE is supported on the HCA and enabled user to try and set RoCE to on. Fix it by checking whether the HCA supports RoCE. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2a35b2c2 |
|
17-Oct-2022 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path There are two cleanup calls missing in mlx5_init_once() error path. Add them making the error path flow to be the same as mlx5_cleanup_once(). Fixes: 52ec462eca9b ("net/mlx5: Add reserved-gids support") Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
12eb0f84 |
|
29-Sep-2022 |
Petr Pavlu <petr.pavlu@suse.com> |
net/mlx5: Remove unused ctx variables Remove mlx5_priv.ctx_list and ctx_lock which are no longer used after commit 601c10c89cbb ("net/mlx5: Delete custom device management logic"). Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
02ca1732 |
|
27-Nov-2022 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
net/mlx5e: Remove unneeded io-mapping.h #include The mlx5 net files don't use io_mapping functionalities. So there is no point in including <linux/io-mapping.h>. Remove it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
dd3dd726 |
|
21-Sep-2022 |
Eli Cohen <elic@nvidia.com> |
net/mlx5: Expose vhca_id to debugfs hca_id is an identifier of an mlx5_core instance within the hardware. This identifier may be required for troubleshooting. Expose it to debugfs. Example: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.2/vhca_id 0x12 Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
71b75f0e |
|
08-Aug-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Unregister traps on driver unload flow Before this patch, devlink traps are registered only on full driver probe and unregistered on driver removal. As devlink traps are not usable once driver functionality is unloaded, it should be unrgeistered also on flows that unload the driver and then registered when loaded back, e.g. devlink reload flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
394164f9 |
|
24-Jul-2022 |
Roy Novich <royno@nvidia.com> |
net/mlx5: Do not query pci info while pci disabled The driver should not interact with PCI while PCI is disabled. Trying to do so may result in being unable to get vital signs during PCI reset, driver gets timed out and fails to recover. Fixes: fad1783a6d66 ("net/mlx5: Print more info on pci error handlers") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
416ef713 |
|
26-Oct-2022 |
Roy Novich <royno@nvidia.com> |
net/mlx5: Update fw fatal reporter state on PCI handlers successful recover Update devlink health fw fatal reporter state to "healthy" is needed by strictly calling devlink_health_reporter_state_update() after recovery was done by PCI error handler. This is needed when fw_fatal reporter was triggered due to PCI error. Poll health is called and set reporter state to error. Health recovery failed (since EEH didn't re-enable the PCI). PCI handlers keep on recover flow and succeed later without devlink acknowledgment. Fix this by adding devlink state update at the end of the PCI handler recovery process. Fixes: 6181e5cb752e ("devlink: add support for reporter recovery completion") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9b98d395 |
|
01-Oct-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Start health poll at earlier stage of driver load Start health poll at earlier stage, so if fw fatal issue occurred before or during initialization commands such as init_hca or set_hca_cap the poll health can detect and indicate that the driver is already in error state. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
90b1df74 |
|
07-Sep-2022 |
Liu, Changcheng <jerrliu@nvidia.com> |
net/mlx5: detect and enable bypass port select flow table Use port selection capability port_select_flow_table_bypass bit to detect and enable explicit port affinity even when in lag hash mode. Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8ff0ac5b |
|
05-Sep-2022 |
Lior Nahmanson <liorna@nvidia.com> |
net/mlx5: Add MACsec offload Tx command support This patch adds support for Connect-X MACsec offload Tx SA commands: add, update and delete. In Connect-X MACsec, a Security Association (SA) is added or deleted via allocating a HW context of an encryption/decryption key and a HW context of a matching SA (MACsec object). When new SA is added: - Use a separate crypto key HW context. - Create a separate MACsec context in HW to include the SA properties. Introduce a new compilation flag MLX5_EN_MACSEC for it. Follow-up patches will implement the Tx steering. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93983863 |
|
05-Sep-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: Query ADV_VIRTUALIZATION capabilities Query ADV_VIRTUALIZATION capabilities which provide information for advanced virtualization related features. Current capabilities refer to the page tracker object which is used for tracking the pages that are dirtied by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220905105852.26398-3-yishaih@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
9ca05b0f |
|
28-Aug-2022 |
Maher Sanalla <msanalla@nvidia.com> |
RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d59b73a6 |
|
03-Aug-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Avoid false positive lockdep warning by adding lock_class_key Add a lock_class_key per mlx5 device to avoid a false positive "possible circular locking dependency" warning by lockdep, on flows which lock more than one mlx5 device, such as adding SF. kernel log: ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc8+ #2 Not tainted ------------------------------------------------------ kworker/u20:0/8 is trying to acquire lock: ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core] but task is already holding lock: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(¬ifier->n_head)->rwsem){++++}-{3:3}: down_write+0x90/0x150 blocking_notifier_chain_register+0x53/0xa0 mlx5_sf_table_init+0x369/0x4a0 [mlx5_core] mlx5_init_one+0x261/0x490 [mlx5_core] probe_one+0x430/0x680 [mlx5_core] local_pci_probe+0xd6/0x170 work_for_cpu_fn+0x4e/0xa0 process_one_work+0x7c2/0x1340 worker_thread+0x6f6/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2fc7/0x6720 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 bus_for_each_drv+0x123/0x1a0 __device_attach+0x1a3/0x460 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 __auxiliary_device_add+0x88/0xc0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(¬ifier->n_head)->rwsem); lock(&dev->intf_state_mutex); lock(&(¬ifier->n_head)->rwsem); lock(&dev->intf_state_mutex); *** DEADLOCK *** 4 locks held by kworker/u20:0/8: #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460 stack backtrace: CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? lock_chain_count+0x20/0x20 ? register_lock_class+0x1880/0x1880 __lock_acquire+0x2fc7/0x6720 ? register_lock_class+0x1880/0x1880 ? register_lock_class+0x1880/0x1880 lock_acquire+0x1c1/0x550 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 __mutex_lock+0x12c/0x14b0 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? _raw_read_unlock+0x1f/0x30 ? mutex_lock_io_nested+0x1320/0x1320 ? __ioremap_caller.constprop.0+0x306/0x490 ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core] ? iounmap+0x160/0x160 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 ? auxiliary_match_id+0xe9/0x140 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 ? driver_allows_async_probing+0x140/0x140 bus_for_each_drv+0x123/0x1a0 ? bus_for_each_dev+0x1a0/0x1a0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? trace_hardirqs_on+0x2d/0x100 __device_attach+0x1a3/0x460 ? device_driver_attach+0x1e0/0x1e0 ? kobject_uevent_env+0x22d/0xf10 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 ? dev_set_name+0xab/0xe0 ? __fw_devlink_link_to_suppliers+0x260/0x260 ? memset+0x20/0x40 ? lockdep_init_map_type+0x21a/0x7d0 __auxiliary_device_add+0x88/0xc0 ? auxiliary_device_init+0x86/0xa0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core] ? lock_downgrade+0x6e0/0x6e0 ? lockdep_hardirqs_on_prepare+0x286/0x400 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d3dbdc9f |
|
28-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Lock mlx5 devlink health recovery callback Change devlink instance locks in mlx5 driver to have devlink health recovery callback locked, while keeping all driver paths which lead to devl_ API functions called by the driver locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
84a433a4 |
|
28-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Lock mlx5 devlink reload callbacks Change devlink instance locks in mlx5 driver to have devlink reload callbacks locked, while keeping all driver paths which lead to devl_ API functions called by the driver locked. Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which are used by the paths which are already locked such as devlink reload callbacks. This patch makes the driver use devl_ API also for traps register as these functions are called from the driver paths parallel to reload that requires locking now. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
42b4f7f6 |
|
27-Jun-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Fix driver use of uninitialized timeout Currently, driver is setting default values to all timeouts during function setup. The offending commit is using a timeout before function setup, meaning: the timeout is 0 (or garbage), since no value have been set. This may result in failure to probe the driver: mlx5_function_setup:1034:(pid 69850): Firmware over 4294967296 MS in pre-initializing state, aborting probe_one:1591:(pid 69850): mlx5_init_one failed with error code -16 Hence, set default values to timeouts during tout_init() Fixes: 37ca95e62ee2 ("net/mlx5: Increase FW pre-init timeout for health recovery") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a6e9085d |
|
24-Jul-2022 |
Maher Sanalla <msanalla@nvidia.com> |
net/mlx5: Adjust log_max_qp to be 18 at most The cited commit limited log_max_qp to be 17 due to FW capabilities. Recently, it turned out that there are old FW versions that supported more than 17, so the cited commit caused a degradation. Thus, set the maximum log_max_qp back to 18 as it was before the cited commit. Fixes: 7f839965b2d7 ("net/mlx5: Update log_max_qp value to be 17 at most") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
dc402ccc |
|
01-Jun-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: Use software VHCA id when it's supported Use software VHCA id when it's supported by the firmware. A unique id is allocated upon mlx5_mdev_init() and freed upon mlx5_mdev_uninit(), as such it stays the same during the full life cycle of the device including upon health recovery if occurred. The conjunction of sw_vhca_id with sw_owner_id will be a global unique id per function which uses mlx5_core. The sw_vhca_id is set upon init_hca command and is used to specify the VHCA that the NIC vport is affiliated with. This functionality is needed upon migration of VM which is MPV based. (i.e. multi port device). Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f019679e |
|
29-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Remove dependency between sriov and eswitch mode Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d107ba1f |
|
08-Jun-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK Remove not used MLX5_CAP_BITS_RW_MASK. While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK was their only user. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b0ea505b |
|
21-May-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
net/mlx5: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c7c8a6d |
|
18-Apr-2022 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5: sparse: error: context imbalance in 'mlx5_vf_get_core_dev' Removing the annotation resolves the issue for some reason. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
37ca95e6 |
|
27-Mar-2022 |
Gavin Li <gavinl@nvidia.com> |
net/mlx5: Increase FW pre-init timeout for health recovery Currently, health recovery will reload driver to recover it from fatal errors. During the driver's load process, it would wait for FW to set the pre-init bit for up to 120 seconds, beyond this threshold it would abort the load process. In some cases, such as a FW upgrade on the DPU, this timeout period is insufficient, and the user has no way to recover the host device. To solve this issue, introduce a new FW pre-init timeout for health recovery, which is set to 2 hours. The timeout for devlink reload and probe will use the original one because they are user triggered flows, and therefore should not have a significantly long timeout, during which the user command would hang. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8324a02c |
|
27-Mar-2022 |
Gavin Li <gavinl@nvidia.com> |
net/mlx5: Add exit route when waiting for FW Currently, removing a device needs to get the driver interface lock before doing any cleanup. If the driver is waiting in a loop for FW init, there is no way to cancel the wait, instead the device cleanup waits for the loop to conclude and release the lock. To allow immediate response to remove device commands, check the TEARDOWN flag while waiting for FW init, and exit the loop if it has been set. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c6e3b421 |
|
09-Mar-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Merge various control path IPsec headers into one file The mlx5 IPsec code has logical separation between code that operates with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering logic (ipsec_fs.c) and data path (ipsec_rxtx.c). Such separation makes sense for C-files, but isn't needed at all for H-files as they are included in batch anyway. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cdfc6ffb |
|
22-Mar-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Print initializing field in case of timeout Print the initializing field in case of FW couldn't initialize before timeout. This will help to better understand the root cause in some cases. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f2b41b32 |
|
06-Apr-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Remove ipsec_ops function table There is only one IPsec implementation and ipsec_ops is not needed at all in this situation. Together with removal of ipsec_ops, we can drop the entry checks as these functions are called for IPsec devices only. Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
16fe5a1c |
|
06-Apr-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Move IPsec file to relevant directory IPsec is part of ethernet side of mlx5 driver and needs to be placed in en_accel folder. Link: https://lore.kernel.org/r/a0ca88f4d9c602c574106c0de0511803e7dcbdff.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
7e4e8491 |
|
06-Apr-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Remove ipsec vs. ipsec offload file separation The IPsec won't be initialized at all if device doesn't support IPsec offload. It means that we can combine the ipsec.c and ipsec_offload.c files to one file. Such change will allow us to remove ipsec_ops indirection. Link: https://lore.kernel.org/r/d0ac1fb7b14c10ae20a21ae17a393ee860c72ac3.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
2fa33b35 |
|
06-Apr-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5_fpga: Drop INNOVA IPsec support Mellanox INNOVA IPsec cards are EOL in Nov, 2019 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf Link: https://lore.kernel.org/r/2afe88ec5020a491079eacf6fe3c89b64d65195c.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
40379a00 |
|
04-Apr-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5_fpga: Drop INNOVA TLS support Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf Link: https://lore.kernel.org/r/b88add368def721ea9d054cb69def72d9e3f67aa.1649073691.git.leonro@nvidia.com Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
16d42d31 |
|
04-Apr-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Drain fw_reset when removing device In case fw sync reset is called in parallel to device removal, device might stuck in the following deadlock: CPU 0 CPU 1 ----- ----- remove_one uninit_one (locks intf_state_mutex) mlx5_sync_reset_now_event() work in fw_reset->wq. mlx5_enter_error_state() mutex_lock (intf_state_mutex) cleanup_once fw_reset_cleanup() destroy_workqueue(fw_reset->wq) Drain the fw_reset WQ, and make sure no new work is being queued, before entering uninit_one(). The Drain is done before devlink_unregister() since fw_reset, in some flows, is using devlink API devlink_remote_reload_actions_performed(). Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b3388697 |
|
09-Mar-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Initialize flow steering during driver probe Currently, software objects of flow steering are created and destroyed during reload flow. In case a device is unloaded, the following error is printed during grace period: mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95): Driver is in error state. Unloading As a solution to fix use-after-free bugs, where we try to access these objects, when reading the value of flow_steering_mode devlink param[1], let's split flow steering creation and destruction into two routines: * init and cleanup: memory, cache, and pools allocation/free. * create and destroy: namespaces initialization and cleanup. While at it, re-order the cleanup function to mirror the init function. [1] Kasan trace: [ 385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] Read of size 4 at addr ffff888104b79308 by task bash/291 [ 385.119849 ] [ 385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2 [ 385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [ 385.119849 ] Call Trace: [ 385.119849 ] <TASK> [ 385.119849 ] dump_stack_lvl+0x6e/0x91 [ 385.119849 ] print_address_description.constprop.0+0x1f/0x160 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] kasan_report.cold+0x83/0xdf [ 385.119849 ] ? devlink_param_notify+0x20/0x190 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] devlink_nl_param_fill+0x18a/0xa50 [ 385.119849 ] ? _raw_spin_lock_irqsave+0x8d/0xe0 [ 385.119849 ] ? devlink_flash_update_timeout_notify+0xf0/0xf0 [ 385.119849 ] ? __wake_up_common+0x4b/0x1e0 [ 385.119849 ] ? preempt_count_sub+0x14/0xc0 [ 385.119849 ] ? _raw_spin_unlock_irqrestore+0x28/0x40 [ 385.119849 ] ? __wake_up_common_lock+0xe3/0x140 [ 385.119849 ] ? __wake_up_common+0x1e0/0x1e0 [ 385.119849 ] ? __sanitizer_cov_trace_const_cmp8+0x27/0x80 [ 385.119849 ] ? __rcu_read_unlock+0x48/0x70 [ 385.119849 ] ? kasan_unpoison+0x23/0x50 [ 385.119849 ] ? __kasan_slab_alloc+0x2c/0x80 [ 385.119849 ] ? memset+0x20/0x40 [ 385.119849 ] ? __sanitizer_cov_trace_const_cmp4+0x25/0x80 [ 385.119849 ] devlink_param_notify+0xce/0x190 [ 385.119849 ] devlink_unregister+0x92/0x2b0 [ 385.119849 ] remove_one+0x41/0x140 [ 385.119849 ] pci_device_remove+0x68/0x140 [ 385.119849 ] ? pcibios_free_irq+0x10/0x10 [ 385.119849 ] __device_release_driver+0x294/0x3f0 [ 385.119849 ] device_driver_detach+0x82/0x130 [ 385.119849 ] unbind_store+0x193/0x1b0 [ 385.119849 ] ? subsys_interface_unregister+0x270/0x270 [ 385.119849 ] drv_attr_store+0x4e/0x70 [ 385.119849 ] ? drv_attr_show+0x60/0x60 [ 385.119849 ] sysfs_kf_write+0xa7/0xc0 [ 385.119849 ] kernfs_fop_write_iter+0x23a/0x2f0 [ 385.119849 ] ? sysfs_kf_bin_read+0x160/0x160 [ 385.119849 ] new_sync_write+0x311/0x430 [ 385.119849 ] ? new_sync_read+0x480/0x480 [ 385.119849 ] ? _raw_spin_lock+0x87/0xe0 [ 385.119849 ] ? __sanitizer_cov_trace_cmp4+0x25/0x80 [ 385.119849 ] ? security_file_permission+0x94/0xa0 [ 385.119849 ] vfs_write+0x4c7/0x590 [ 385.119849 ] ksys_write+0xf6/0x1e0 [ 385.119849 ] ? __x64_sys_read+0x50/0x50 [ 385.119849 ] ? fpregs_assert_state_consistent+0x99/0xa0 [ 385.119849 ] do_syscall_64+0x3d/0x90 [ 385.119849 ] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 385.119849 ] RIP: 0033:0x7fc36ef38504 [ 385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 [ 385.119849 ] RSP: 002b:00007ffde0ff3d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 385.119849 ] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc36ef38504 [ 385.119849 ] RDX: 000000000000000c RSI: 00007fc370521040 RDI: 0000000000000001 [ 385.119849 ] RBP: 00007fc370521040 R08: 00007fc36f00b8c0 R09: 00007fc36ee4b740 [ 385.119849 ] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc36f00a760 [ 385.119849 ] R13: 000000000000000c R14: 00007fc36f005760 R15: 000000000000000c [ 385.119849 ] </TASK> [ 385.119849 ] [ 385.119849 ] Allocated by task 65: [ 385.119849 ] kasan_save_stack+0x1e/0x40 [ 385.119849 ] __kasan_kmalloc+0x81/0xa0 [ 385.119849 ] mlx5_init_fs+0x11b/0x1160 [ 385.119849 ] mlx5_load+0x13c/0x220 [ 385.119849 ] mlx5_load_one+0xda/0x160 [ 385.119849 ] mlx5_recover_device+0xb8/0x100 [ 385.119849 ] mlx5_health_try_recover+0x2f9/0x3a1 [ 385.119849 ] devlink_health_reporter_recover+0x75/0x100 [ 385.119849 ] devlink_health_report+0x26c/0x4b0 [ 385.275909 ] mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0 [ 385.275909 ] process_one_work+0x520/0x970 [ 385.275909 ] worker_thread+0x378/0x950 [ 385.275909 ] kthread+0x1bb/0x200 [ 385.275909 ] ret_from_fork+0x1f/0x30 [ 385.275909 ] [ 385.275909 ] Freed by task 65: [ 385.275909 ] kasan_save_stack+0x1e/0x40 [ 385.275909 ] kasan_set_track+0x21/0x30 [ 385.275909 ] kasan_set_free_info+0x20/0x30 [ 385.275909 ] __kasan_slab_free+0xfc/0x140 [ 385.275909 ] kfree+0xa5/0x3b0 [ 385.275909 ] mlx5_unload+0x2e/0xb0 [ 385.275909 ] mlx5_unload_one+0x86/0xb0 [ 385.275909 ] mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf [ 385.275909 ] process_one_work+0x520/0x970 [ 385.275909 ] worker_thread+0x378/0x950 [ 385.275909 ] kthread+0x1bb/0x200 [ 385.275909 ] ret_from_fork+0x1f/0x30 [ 385.275909 ] [ 385.275909 ] The buggy address belongs to the object at ffff888104b79300 [ 385.275909 ] which belongs to the cache kmalloc-128 of size 128 [ 385.275909 ] The buggy address is located 8 bytes inside of [ 385.275909 ] 128-byte region [ffff888104b79300, ffff888104b79380) [ 385.275909 ] The buggy address belongs to the page: [ 385.275909 ] page:00000000de44dd39 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104b78 [ 385.275909 ] head:00000000de44dd39 order:1 compound_mapcount:0 [ 385.275909 ] flags: 0x8000000000010200(slab|head|zone=2) [ 385.275909 ] raw: 8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0 [ 385.275909 ] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 385.275909 ] page dumped because: kasan: bad access detected [ 385.275909 ] [ 385.275909 ] Memory state around the buggy address: [ 385.275909 ] ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc [ 385.275909 ] ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 385.275909 ] >ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 385.275909 ] ^ [ 385.275909 ] ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 385.275909 ] ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 385.275909 ]] Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
66771a1c |
|
18-Feb-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Move debugfs entries to separate struct Move the debugfs entry pointers under priv to their own struct. Add get function for device debugfs root. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1695b97b |
|
24-Feb-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: Expose APIs to get/put the mlx5 core device Expose an API to get the mlx5 core device from a given VF PCI device if mlx5_core is its driver. Upon the get API we stay with the intf_state_mutex locked to make sure that the device can't be gone/unloaded till the caller will complete its job over the device, this expects to be for a short period of time for any flow that the lock is taken. Upon the put API we unlock the intf_state_mutex. The use case for those APIs is the migration flow of a VF over VFIO PCI. In that case the VF doesn't ride on mlx5_core, because the device is driving *two* different PCI devices, the PF owned by mlx5_core and the VF owned by the vfio driver. The mlx5_core of the PF is accessed only during the narrow window of the VF's ioctl that requires its services. This allows the PF driver to be more independent of the VF driver, so long as it doesn't reset the FW. Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
143a41d7 |
|
24-Feb-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: Disable SRIOV before PF removal Virtual functions depend on physical function for device access (for example firmware host PAGE management), so make sure to disable SR-IOV once PF is gone. This will prevent also the below warning if PF has gone before disabling SR-IOV. "driver left SR-IOV enabled after remove" Next patch from this series will rely on that when the VF may need to access safely the PF 'driver data'. Link: https://lore.kernel.org/all/20220224142024.147653-4-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
605bef00 |
|
05-Apr-2020 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: cmdif, cmd_check refactoring Do not mangle the command outbox in the internal low level cmd_exec and cmd_invoke functions. Instead return a proper unique error code and move the driver error checking to be at a higher level in mlx5_cmd_exec(). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7f839965 |
|
16-Feb-2022 |
Maher Sanalla <msanalla@nvidia.com> |
net/mlx5: Update log_max_qp value to be 17 at most Currently, log_max_qp value is dependent on what FW reports as its max capability. In reality, due to a bug, some FWs report a value greater than 17, even though they don't support log_max_qp > 17. This FW issue led the driver to exhaust memory on startup. Thus, log_max_qp value is set to be no more than 17 regardless of what FW reports, as it was before the cited commit. Fixes: f79a609ea6bf ("net/mlx5: Update log_max_qp value to FW max capability") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f908a35b |
|
10-Jan-2022 |
Meir Lichtinger <meirl@nvidia.com> |
net/mlx5: Update the list of the PCI supported devices Add the upcoming BlueField-4 and ConnectX-8 device IDs. Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f79a609e |
|
05-Jan-2022 |
Maher Sanalla <msanalla@nvidia.com> |
net/mlx5: Update log_max_qp value to FW max capability log_max_qp in driver's default profile #2 was set to 18, but FW actually supports 17 at the most - a situation that led to the concerning print when the driver is loaded: "log_max_qp value in current profile is 18, changing to HCA capabaility limit (17)" The expected behavior from mlx5_profile #2 is to match the maximum FW capability in regards to log_max_qp. Thus, log_max_qp in profile #2 is initialized to a defined static value (0xff) - which basically means that when loading this profile, log_max_qp value will be what the currently installed FW supports at most. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8680a60f |
|
08-Dec-2021 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Let user configure max_macs generic param Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fad1783a |
|
04-Oct-2021 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5: Print more info on pci error handlers In case mlx5_pci_err_detected was called with state equals to pci_channel_io_perm_failure, the driver will never come back up. It is nice to know why the driver went to zombie land, so print some useful information on pci err handlers. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
|
#
33de865f |
|
23-Nov-2021 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Fix SF health recovery flow SF do not directly control the PCI device. During recovery flow SF should not be allowed to do pci disable or pci reset, its PF will do it. It fixes the following kernel trace: mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532 mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532 mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x658908) mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
502e82b9 |
|
07-Nov-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5: Fix access to a non-supported register Validate MRTC register is supported before triggering a delayed work which accesses it. Fixes: 5a1023deeed0 ("net/mlx5: Add periodic update of host time to firmware") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
76091b0f |
|
19-Oct-2021 |
Amir Tzin <amirtz@nvidia.com> |
net/mlx5: Fix use after free in mlx5_health_wait_pci_up The device health recovery flow calls mlx5_health_wait_pci_up() which queries the device for FW_RESET timeout after freeing the device timeouts structure on mlx5_function_teardown(). Fix this bug by moving timeouts structure init/cleanup to the device's init/uninit phases. Since it is necessary to reset default software timeouts on function reload, extract setting of defaults values from mlx5_tout_init() and call mlx5_tout_set_def_val() directly from mlx5_function_setup(). Fixes: 5945e1adeab5 ("net/mlx5: Read timeout values from init segment") Reported by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7025329d |
|
09-Sep-2020 |
Ben Ben-Ishay <benishay@nvidia.com> |
net/mlx5: Add SHAMPO caps, HW bits and enumerations This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure, SHAMPO related HW spec hardware fields and enumerations. SHAMPO stands for: split headers and merge payload offload. SHAMPO new fields: WQ: - headers_mkey: mkey that represents the headers buffer, where the packets headers will be written by the HW. - shampo_enable: flag to verify if the WQ supports SHAMPO feature. - log_reservation_size: the log of the reservation size where the data of the packet will be written by the HW. - log_max_num_of_packets_per_reservation: log of the maximum number of packets that can be written to the same reservation. - log_headers_entry_size: log of the header entry size of the headers buffer. - log_headers_buffer_entry_num: log of the entries number of the headers buffer. RQ: - shampo_no_match_alignment_granularity: the HW alignment granularity in case the received packet doesn't match the current session. - shampo_match_criteria_type: the type of match criteria. - reservation_timeout: the maximum time that the HW will hold the reservation. mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature: - shampo_log_max_reservation_size: the maximum allowed value of the field WQ.log_reservation_size. - log_reservation_size: the minimum allowed value of the field WQ.log_reservation_size. - shampo_min_mss_size: the minimum payload size of packet that can open a new session or be merged to a session. - shampo_max_log_headers_entry_size: the maximum allowed value of the field WQ.log_headers_entry_size Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6b367174 |
|
26-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
net/mlx5: remove the recent devlink params revert commit 46ae40b94d88 ("net/mlx5: Let user configure io_eq_size param") revert commit a6cb08daa3b4 ("net/mlx5: Let user configure event_eq_size param") revert commit 554604061979 ("net/mlx5: Let user configure max_macs param") The EQE parameters are applicable to more drivers, they should be configured via standard API, probably ethtool. Example of another driver needing something similar: https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/ The last param for "max_macs" is probably fine but the documentation is severely lacking. The meaning and implications for changing the param need to be stated. Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
55460406 |
|
15-Aug-2021 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Let user configure max_macs param Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
46ae40b9 |
|
12-Aug-2021 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Let user configure io_eq_size param Currently, each I/O EQ is taking 128KB of memory. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the size of I/O EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64 $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
425a563a |
|
23-May-2021 |
Maor Gottlieb <maorg@nvidia.com> |
net/mlx5: Introduce port selection namespace Add new port selection flow steering namespace. Flow steering rules in this namespaceare are used to determine the physical port for egress packets. Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fbfa97b4 |
|
18-Aug-2021 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Disable roce at HCA level Currently, when a user disables roce via the devlink param, this change isn't passed down to the device. If device allows disabling RoCE at device level, make use of it. This instructs the device to skip memory allocations related to RoCE functionality which otherwise is done by the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
32def412 |
|
13-Oct-2021 |
Amir Tzin <amirtz@nvidia.com> |
net/mlx5: Read timeout values from DTOR Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5945e1ad |
|
07-Oct-2021 |
Amir Tzin <amirtz@mellanox.com> |
net/mlx5: Read timeout values from init segment Replace hard coded timeouts with values stored in firmware's init segment. Timeouts are read from init segment during driver load. If init segment timeouts are not supported then fallback to hard coded defaults instead. Also move pre initialization timeouts which cannot be read from firmware to the new mechanism. Signed-off-by: Amir Tzin <amirtz@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
82465bec |
|
12-Oct-2021 |
Leon Romanovsky <leon@kernel.org> |
devlink: Delete reload enable/disable interface Commit a0c76345e3d3 ("devlink: disallow reload operation during device cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload operation from racing with device probe/dismantle. After recent changes to move devlink_register() to the end of device probe and devlink_unregister() to the beginning of device dismantle, these races can no longer happen. Reload operations will be denied if the devlink instance is unregistered and devlink_unregister() will block until all in-flight operations are done. Therefore, remove these devlink_reload_{enable,disable}() APIs. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f62eb932 |
|
02-Aug-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5: Tolerate failures in debug features while driver load FW tracer and resource dump are debug features. Although failing to initialize them may indicate an error, don't let this stop device loading. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
64ea2d0e |
|
25-Sep-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Accept devlink user input after driver initialization complete The change of devlink_alloc() to accept device makes sure that device is fully initialized and device_register() does nothing except allowing users to use that devlink instance. Such change ensures that no user input will be usable till that point and it eliminates the need to worry about internal locking as long as devlink_register is called last since all accesses to the devlink are during initialization. This change fixes the following lockdep warning. ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #27 Not tainted ------------------------------------------------------ devlink/265 is trying to acquire lock: ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core] but task is already holding lock: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (devlink_mutex){+.+.}-{3:3}: __mutex_lock+0x149/0x1310 devlink_register+0xe7/0x280 mlx5_devlink_register+0x118/0x480 [mlx5_core] mlx5_init_one+0x34b/0x440 [mlx5_core] probe_one+0x480/0x6e0 [mlx5_core] pci_device_probe+0x2a0/0x4a0 really_probe+0x1cb/0xba0 __driver_probe_device+0x18f/0x470 driver_probe_device+0x49/0x120 __driver_attach+0x1ce/0x400 bus_for_each_dev+0x11e/0x1a0 bus_add_driver+0x309/0x570 driver_register+0x20f/0x390 0xffffffffa04a0062 do_one_initcall+0xd5/0x400 do_init_module+0x1c8/0x760 load_module+0x7d9d/0xa4b0 __do_sys_finit_module+0x118/0x1a0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2999/0x5a40 lock_acquire+0x1a9/0x4a0 __mutex_lock+0x149/0x1310 mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] devlink_reload+0x1f2/0x640 devlink_nl_cmd_reload+0x6c3/0x10d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x27f/0x4a0 netlink_rcv_skb+0x11e/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6fb/0xbe0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(devlink_mutex); lock(&dev->intf_state_mutex); lock(devlink_mutex); lock(&dev->intf_state_mutex); *** DEADLOCK *** 3 locks held by devlink/265: #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0 #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 stack backtrace: CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0x268/0x310 ? print_circular_bug+0x460/0x460 ? __kernel_text_address+0xe/0x30 ? alloc_chain_hlocks+0x1e6/0x5a0 __lock_acquire+0x2999/0x5a40 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? add_lock_to_list.constprop.0+0x6c/0x530 lock_acquire+0x1a9/0x4a0 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_release+0x6c0/0x6c0 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? lock_is_held_type+0x98/0x110 __mutex_lock+0x149/0x1310 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_is_held_type+0x98/0x110 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? find_held_lock+0x2d/0x110 ? mutex_lock_io_nested+0x1160/0x1160 ? mlx5_lag_is_active+0x72/0x90 [mlx5_core] ? lock_downgrade+0x6d0/0x6d0 ? do_raw_spin_lock+0x12e/0x270 ? rwlock_bug.part.0+0x90/0x90 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] ? netlink_broadcast_filtered+0x308/0xac0 ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core] ? __build_skb_around+0x110/0x2b0 ? __alloc_skb+0x113/0x2b0 devlink_reload+0x1f2/0x640 ? devlink_unregister+0x1e0/0x1e0 ? security_capable+0x51/0x90 devlink_nl_cmd_reload+0x6c3/0x10d0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? devlink_nl_pre_doit+0x72/0x8d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? __lock_acquire+0x15e2/0x5a40 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? mutex_lock_io_nested+0x1160/0x1160 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x4a0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11e/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x930/0x930 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x750/0x750 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6fb/0xbe0 ? netlink_unicast+0x700/0x700 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? do_sys_openat2+0x10a/0x370 ? down_write_nested+0x150/0x150 ? do_user_addr_fault+0x215/0xd50 ? __x64_sys_openat+0x11f/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 __x64_sys_sendto+0xdc/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f50b50b6b3a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440 mlx5_core 0000:00:09.0: firmware version: 4.8.9999 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link) mlx5_core 0000:00:09.0 eth1: Link up Fixes: a6f3b62386a0 ("net/mlx5: Move devlink registration before interfaces load") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb9c5c0d |
|
22-Aug-2021 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
net/mellanox: switch from 'pci_' to 'dma_' API The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. It has been hand modified to use 'dma_set_mask_and_coherent()' instead of 'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable. This is less verbose. It has been compile tested. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44f66ac9 |
|
16-Jun-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Initialize numa node for all core devices Subsequent patches make use of numa node affinity for memory allocations. Initialize it for PCI PF, VF and SF devices. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
48f02eef |
|
13-Jul-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Allocate individual capability Currently mlx5_core_dev contains array of capabilities. It contains 19 valid capabilities of the device, 2 reserved entries and 12 holes. Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K bytes of memory which is never used. Due to this mlx5_core_dev structure size is 270Kbytes odd. This allocation further aligns to next power of 2 to 512Kbytes. By skipping non-existent entries, (a) 112Kbyte is saved, (b) mlx5_core_dev reduces to 8KB with alignment (c) 350KB saved in alignment In future individual capability allocation can be used to skip its allocation when such capability is disabled at the device level. This patch prepares mlx5_core_dev to hold capability using a pointer instead of inline array. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5958a6fa |
|
12-Jul-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Reorganize current and maximal capabilities to be per-type In the current code, the current and maximal capabilities are maintained in separate arrays which are both per type. In order to allow the creation of such a basic structure as a dynamically allocated array, we move curr and max fields to a unified structure so that specific capabilities can be allocated as one unit. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8e792700 |
|
01-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Delete impossible dev->state checks New mlx5_core device structure is allocated through devlink_alloc with\ kzalloc and that ensures that all fields are equal to zero and it includes ->state too. That means that checks of that field in the mlx5_init_one() is completely redundant, because that function is called only once in the begging of mlx5_core_dev lifetime. PCI: .probe() -> probe_one() -> mlx5_init_one() The recovery flow can't run at that time or before it, because relevant work initialized later in mlx5_init_once(). Such initialization flow ensures that dev->state can't be MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible checks. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
919d13a7 |
|
08-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
devlink: Set device as early as possible All kernel devlink implementations call to devlink_alloc() during initialization routine for specific device which is used later as a parent device for devlink_register(). Such late device assignment causes to the situation which requires us to call to device_register() before setting other parameters, but that call opens devlink to the world and makes accessible for the netlink users. Any attempt to move devlink_register() to be the last call generates the following error due to access to the devlink->dev pointer. [ 8.758862] devlink_nl_param_fill+0x2e8/0xe50 [ 8.760305] devlink_param_notify+0x6d/0x180 [ 8.760435] __devlink_params_register+0x2f1/0x670 [ 8.760558] devlink_params_register+0x1e/0x20 The simple change of API to set devlink device in the devlink_alloc() instead of devlink_register() fixes all this above and ensures that prior to call to devlink_register() everything already set. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cac1eb2c |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Lag, properly lock eswitch if needed Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c633e799 |
|
02-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Don't skip subfunction cleanup in case of error in module init Clean SF resources if mlx5 eth failed to initialize. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3b43190b |
|
06-Apr-2021 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Introduce API for request and release IRQs Introduce new API that will allow IRQs users to hold a pointer to mlx5_irq. In the end of this series, IRQs will be allocated on demand. Hence, this will allow us to properly manage and use IRQs. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8a66e458 |
|
14-Apr-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Change ownership model for lag Lag is used to combine two PCI functions of the same HCA into a single logical unit. This is a core functionality and as such should be managed by the core driver. Currently this isn't the case. While we store the lag software structure inside the lower device, its lifetime (creation / destruction) is dictated by the mlx5e part. Change the ownership model so lag is tied to the lifetime of the lower level driver instead to the mlx5e part. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
94a4b841 |
|
08-Mar-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Fix error path for set HCA defaults In the case of the failure to execute mlx5_core_set_hca_defaults(), we used wrong goto label to execute error unwind flow. Fixes: 5bef709d76a2 ("net/mlx5: Enable host PF HCA after eswitch is initialized") Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3410fbcd |
|
12-May-2021 |
Maor Gottlieb <maorg@nvidia.com> |
{net, RDMA}/mlx5: Fix override of log_max_qp by other device mlx5_core_dev holds pointer to static profile, hence when the log_max_qp of the profile is override by some device, then it effect all other mlx5 devices that share the same profile. Fix it by having a profile instance for every mlx5 device. Fixes: 883371c453b9 ("net/mlx5: Check FW limitations on log_max_qp before setting it") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e71b75f7 |
|
14-Mar-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks The mlx5 implementation executes a firmware command on the PF to change the configuration of the selected VF. Link: https://lore.kernel.org/linux-pci/20210314124256.70253-5-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
604774ad |
|
14-Mar-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Dynamically assign MSI-X vectors count The number of MSI-X vectors is a PCI property visible through lspci. The field is read-only and configured by the device. The mlx5 devices work in a static or dynamic assignment mode. Static assignment means that all newly created VFs have a preset number of MSI-X vectors determined by device configuration parameters. This can result in some VFs having too many or too few MSI-X vectors. Till now this has been the only means of fine-tuning the MSI-X vector count and it was acceptable for small numbers of VFs. With dynamic assignment the inefficiency of having a fixed number of MSI-X vectors can be avoided with each VF having exactly the required vectors. Userspace will provide this information while provisioning the VF for use, based on the intended use. For instance if being used with a VM, the MSI-X vector count might be matched to the CPU count of the VM. For compatibility mlx5 continues to start up with MSI-X vector assignment, but the kernel can now access a larger dynamic vector pool and assign more vectors to created VFs. Link: https://lore.kernel.org/linux-pci/20210314124256.70253-4-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
fe06992b |
|
03-Nov-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Check returned value from health recover sequence MLX5_INTERFACE_STATE_UP is far from being reliable check for success to recover, because it can be changed any time and health logic doesn't have any locks to protect from it. The locks are not needed here because health recover is good to have, but not must to success, so rely on the returned value from the mlx5_recover_device() as a marker for success/failure. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6dea2f7e |
|
02-Nov-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Separate probe vs. reload flows The mix between probe/unprobe and reload flows causes to have an extra mutex lock intf_state_mutex that generates LOCKDEP warning between it and devlink_mutex. As a preparation for the future removal, separate those flows. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
88a68672 |
|
12-Feb-2021 |
Sasha Levin <sashal@kernel.org> |
kbuild: simplify access to the kernel's version Instead of storing the version in a single integer and having various kernel (and userspace) code how it's constructed, export individual (major, patchlevel, sublevel) components and simplify kernel code that uses it. This should also make it easier on userspace. Signed-off-by: Sasha Levin <sashal@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
d89ddaae |
|
30-Dec-2020 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Disable devlink reload for multi port slave device Devlink reload can't be allowed on a multi port slave device, because reload of slave device doesn't take effect. The right flow is to disable devlink reload for multi port slave device. Hence, disabling it in mlx5_core probing. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3d347b1b |
|
26-Jan-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5: Add support for devlink traps in mlx5 core driver Add devlink traps infra-structure to mlx5 core driver. Add traps list to mlx5_priv and corresponding API: - mlx5_devlink_trap_report() to wrap trap reports to devlink - mlx5_devlink_trap_get_num_active() to decide whether to open/close trap resources. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6a327321 |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Port function state change support Support changing the state of the SF port's function through devlink. When activating the SF port's function, enable the hca in the device followed by adding its auxiliary device. When deactivating the SF port's function, delete its auxiliary device followed by disabling the vHCA. Port function attributes get/set callbacks are invoked with devlink instance lock held. Such callbacks need to synchronize with sf port table getting disabled either via sriov sysfs callback. Such callbacks synchronize with table disable context holding table refcount. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:88:88 state inactive opstate detached $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active $ devlink port show ens2f0npf0sf88 -jp { "port": { "pci/0000:06:00.0/32768": { "type": "eth", "netdev": "ens2f0npf0sf88", "flavour": "pcisf", "controller": 0, "pfnum": 0, "sfnum": 88, "external": false, "splittable": false, "function": { "hw_addr": "00:00:00:00:88:88", "state": "active", "opstate": "attached" } } } } On port function activation, an auxiliary device is created in below example. $ devlink dev show devlink dev show auxiliary/mlx5_core.sf.4 $ devlink port show auxiliary/mlx5_core.sf.4/1 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8f010541 |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Add port add delete functionality To handle SF port management outside of the eswitch as independent software layer, introduce eswitch notifier APIs so that mlx5 upper layer who wish to support sf port management in switchdev mode can perform its task whenever eswitch mode is set to switchdev or before eswitch is disabled. Initialize sf port table on such eswitch event. Add SF port add and delete functionality in switchdev mode. Destroy all SF ports when eswitch is disabled. Expose SF port add and delete to user via devlink commands. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached or by its unique port index: $ devlink port show pci/0000:06:00.0/32768 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 -jp { "port": { "pci/0000:06:00.0/32768": { "type": "eth", "netdev": "ens2f0npf0sf88", "flavour": "pcisf", "controller": 0, "pfnum": 0, "sfnum": 88, "external": false, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00", "state": "inactive", "opstate": "detached" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1958fc2f |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Add auxiliary device driver Add auxiliary device driver for mlx5 subfunction auxiliary device. A mlx5 subfunction is similar to PCI PF and VF. For a subfunction an auxiliary device is created. As a result, when mlx5 SF auxiliary device binds to the driver, its netdev and rdma device are created, they appear as $ ls -l /sys/bus/auxiliary/devices/ mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4 $ ls -l /sys/class/net/eth1/device /sys/class/net/eth1/device -> ../../../mlx5_core.sf.4 $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum 88 $ devlink dev show pci/0000:06:00.0 auxiliary/mlx5_core.sf.4 $ devlink port show auxiliary/mlx5_core.sf.4/1 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false $ rdma link show mlx5_0/1 link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 $ rdma dev show 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 In future, devlink device instance name will adapt to have sfnum annotation using either an alias or as devlink instance name described in RFC [1]. [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/ Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
90d010b8 |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Add auxiliary device support Introduce API to add and delete an auxiliary device for an SF. Each SF has its own dedicated window in the PCI BAR 2. SF device is similar to PCI PF and VF that supports multiple class of devices such as net, rdma and vdpa. SF device will be added or removed in subsequent patch during SF devlink port function state change command. A subfunction device exposes user supplied subfunction number which will be further used by systemd/udev to have deterministic name for its netdevice and rdma device. An mlx5 subfunction auxiliary device example: $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:88:88 state inactive opstate detached $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active On activation, $ ls -l /sys/bus/auxiliary/devices/ mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4 $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum 88 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f3196bb0 |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Introduce vhca state event notifier vhca state events indicates change in the state of the vhca that may occur due to a SF allocation, deallocation or enabling/disabling the SF HCA. Introduce vhca state event handler which will be used by SF devlink port manager and SF hardware id allocator in subsequent patches to act on the event. This enables single entity to subscribe, query and rearm the event for a function. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4d8be211 |
|
04-Jan-2021 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Release devlink object if adev fails Add missed freeing previously allocated devlink object. Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
93f82444 |
|
03-Oct-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/mlx5: Convert mlx5_ib to use auxiliary bus The conversion to auxiliary bus solves long standing issue with existing mlx5_ib<->mlx5_core coupling. It required to have both modules in initramfs if one of them needed for the boot. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
912cebf4 |
|
04-Oct-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5e: Connect ethernet part to auxiliary bus Reuse auxiliary bus to perform device management of the ethernet part of the mlx5 driver. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
a925b5e3 |
|
08-Oct-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Register mlx5 devices to auxiliary virtual bus Create auxiliary devices under new virtual bus. This will replace the custom-made mlx5 ->add()/->remove() interfaces and next patches will fill the missing callback and remove the old interface logic. The attachment of auxiliary drivers to the devices is possible in 1-to-1 manner only and it requires us to create device for every protocol, so that device (module) will be able to connect to it. System with 2 IB and 1 RoCE cards: [leonro@vm ~]$ lspci |grep nox 00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2 [leonro@vm ~]$ rdma dev 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457 System with RoCE SR-IOV card with 4 VFs: [leonro@vm ~]$ lspci |grep nox 01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4 [leonro@vm ~]$ rdma dev 0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456 2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457 3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458 4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459 Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
17a7612b |
|
04-Oct-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5_core: Clean driver version and name Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled. And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables. Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
907af0f0 |
|
15-Oct-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Properly convey driver version to firmware mlx5 firmware expects driver version in specific format X.X.X, so make it always correct and based on real kernel version aligned with the driver. Fixes: 012e50e109fd ("net/mlx5: Set driver version into firmware") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
5bef709d |
|
20-Nov-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Enable host PF HCA after eswitch is initialized Currently ECPF enables external host PF too early in the initialization sequence for Ethernet links when ECPF is eswitch manager. Due to this, when external host PF driver is loaded, host PF's HCA CAP has inner_ip_version supported by NIC RX flow table. This capability is later updated by firmware after ECPF driver enables ENCAP/DECAP as eswitch manager. This results into a timing race condition, where CREATE_TIR command fails with a below syndrome on host PF. mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x562b00) Hence, enable the external host PF after necessary eswitch and per vport initialization is completed. Continue to enable host PF when eswitch manager capability is off for a ECPF. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
dd8595ea |
|
20-Nov-2020 |
Meir Lichtinger <meirl@nvidia.com> |
net/mlx5: Update the list of the PCI supported devices Add the upcoming BlueField-3 device ID. Signed-off-by: Meir Lichtinger <meirl@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
38b9f903 |
|
07-Oct-2020 |
Moshe Shemesh <moshe@mellanox.com> |
net/mlx5: Handle sync reset request event Once the driver gets sync_reset_request from firmware it prepares for the coming reset and sends acknowledge. After getting this event the driver expects device reset, either it will trigger PCI reset on sync_reset_now event or such PCI reset will be triggered by another PF of the same device. So it moves to reset requested mode and if it gets PCI reset triggered by the other PF it detect the reset and reloads. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e7f4d0bc |
|
07-Oct-2020 |
Moshe Shemesh <moshe@mellanox.com> |
net/mlx5: Set cap for pci sync for fw update event Set capability to notify the firmware that this host driver is capable of handling pci sync for firmware update events. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7be3412a |
|
09-Sep-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: Use dma device access helper Use the PCI device directly for dma accesses as non PCI device unlikely support IOMMU and dma mappings. Introduce and use helper routine to access DMA device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9a6ad1ad |
|
18-Nov-2019 |
Raed Salem <raeds@mellanox.com> |
net/mlx5: Accel, Add core IPsec support for the Connect-X family This to set the base for downstream patches to support the new IPsec implementation of the Connect-X family. Following modifications made: - Remove accel layer dependency from MLX5_FPGA_IPSEC. - Introduce accel_ipsec_ops, each IPsec device will have to support these ops. Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4dca6509 |
|
20-May-2020 |
Michael Guralnik <michaelgur@mellanox.com> |
net/mlx5: Enable QP number request when creating IPoIB underlay QP If in the process of creating the underlay QP for an IPoIB interface the user has set the address and specifically the 1st-3rd bytes representing the QP number, use the requested QP number when creating the underlay QP. For a user to be able to request a QP number on QP creation, the MKEY_BY_NAME NVCONFIG should be set. As mkey_by_name and qp_by_name are coupled in FW. This requires driver to query the mkey_by_name max cap during initialization and set the current cap if it was enabled in FW. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
17e73d47 |
|
02-Jun-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Don't fail driver on failure to create debugfs Clang warns: drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:6: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!priv->dbg_root) { ^~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1303:9: note: uninitialized use occurs here return err; ^~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:2: note: remove the 'if' if its condition is always false if (!priv->dbg_root) { ^~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1259:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. The check of returned value of debugfs_create_dir() is wrong because by the design debugfs failures should never fail the driver and the check itself was wrong too. The kernel compiled without CONFIG_DEBUG_FS will return ERR_PTR(-ENODEV) and not NULL as expected. Fixes: 11f3b84d7068 ("net/mlx5: Split mdev init and pci init") Link: https://github.com/ClangBuiltLinux/linux/issues/1042 Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
98f91c45 |
|
15-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Fix devlink objects and devlink device unregister sequence Current below problems exists. 1. devlink device is registered by mlx5_load_one(). But it is not unregistered by mlx5_unload_one(). This is incorrect. 2. Above issue leads to, When mlx5 PCI device is removed, currently devlink device is unregistered before devlink ports are unregistered in below ladder diagram. remove_one() mlx5_devlink_unregister() [..] devlink_unregister() <- ports are still registered! mlx5_unload_one() mlx5_unregister_device() mlx5_remove_device() mlx5e_remove() mlx5e_devlink_port_unregister() devlink_port_unregister() 3. Condition checking for registering and unregister device are not symmetric either in these routines. Hence, fix the sequence by having load and unload routines symmetric and in right order. i.e. (a) register devlink device followed by registering devlink ports (b) unregister devlink ports followed by devlink device Do this based on boot and cleanup flags instead of different conditions. Fixes: c6acd629eec7 ("net/mlx5e: Add support for devlink-port in non-representors mode") Fixes: f60f315d339e ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
60904cd3 |
|
14-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Disable reload while removing the device While unregistration is in progress, user might be reloading the interface. This can race with unregistration in below flow which uses the resources which are getting disabled by reload flow. Hence, disable the devlink reloading first when removing the device. CPU0 CPU1 ---- ---- local_pci_remove() devlink_mutex remove_one() devlink_nl_cmd_reload() mlx5_unregister_device() devlink_reload() ops->reload_down() mlx5_unload_one() Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
42ea9f1b |
|
06-May-2020 |
Shay Drory <shayd@mellanox.com> |
net/mlx5: drain health workqueue in case of driver load error In case there is a work in the health WQ when we teardown the driver, in driver load error flow, the health work will try to read dev->iseg, which was already unmap in mlx5_pci_close(). Fix it by draining the health workqueue first thing in mlx5_pci_close(). Trace of the error: BUG: unable to handle page fault for address: ffffb5b141c18014 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 1fe95d067 P4D 1fe95d067 PUD 1fe95e067 PMD 1b7823067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 6755 Comm: kworker/u128:2 Not tainted 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? mlx5_health_try_recover+0x4d/0x270 [mlx5_core] mlx5_fw_fatal_reporter_recover+0x16/0x20 [mlx5_core] devlink_health_reporter_recover+0x1c/0x50 devlink_health_report+0xfb/0x240 mlx5_fw_fatal_reporter_err_work+0x65/0xd0 [mlx5_core] process_one_work+0x1fb/0x4e0 ? process_one_work+0x16b/0x4e0 worker_thread+0x4f/0x3d0 kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 ? kthread_cancel_delayed_work_sync+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache 8021q garp mrp stp llc ipmi_devintf ipmi_msghandler rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core mlx5_core sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mlxfw crypto_simd cryptd glue_helper input_leds hyperv_fb intel_rapl_perf joydev serio_raw pci_hyperv pci_hyperv_mini mac_hid hv_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 hv_utils hid_generic hv_storvsc ptp hid_hyperv hid hv_netvsc hyperv_keyboard pps_core scsi_transport_fc psmouse hv_vmbus i2c_piix4 floppy pata_acpi CR2: ffffb5b141c18014 ---[ end trace b12c5503157cad24 ]--- RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:38 in_atomic(): 0, irqs_disabled(): 1, pid: 6755, name: kworker/u128:2 INFO: lockdep is turned off. CPU: 3 PID: 6755 Comm: kworker/u128:2 Tainted: G D 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: dump_stack+0x63/0x88 ___might_sleep+0x10a/0x130 __might_sleep+0x4a/0x80 exit_signals+0x33/0x230 ? blocking_notifier_call_chain+0x16/0x20 do_exit+0xb1/0xc30 ? kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 Fixes: 52c368dc3da7 ("net/mlx5: Move health and page alloc init to mdev_init") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8fc3e29b |
|
20-May-2020 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Fix crash upon suspend/resume Currently a Linux system with the mlx5 NIC always crashes upon hibernation - suspend/resume. Add basic callbacks so the NIC could be suspended and resumed. Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
810cbb25 |
|
14-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Add missing mutex destroy Add mutex destroy calls to balance with mutex_init() done in the init path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4f7400d5 |
|
06-May-2020 |
Shay Drory <shayd@mellanox.com> |
net/mlx5: Fix error flow in case of function_setup failure Currently, if an error occurred during mlx5_function_setup(), we keep dev->state as DEVICE_STATE_UP. Fixing it by adding a goto label. Fixes: e161105e58da ("net/mlx5: Function setup/teardown procedures") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f7936ddd |
|
19-Mar-2020 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Avoid processing commands before cmdif is ready When driver is reloading during recovery flow, it can't get new commands till command interface is up again. Otherwise we may get to null pointer trying to access non initialized command structures. Add cmdif state to avoid processing commands while cmdif is not ready. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
555af0c3 |
|
15-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move iseg access helper routines close to mlx5_core driver Only mlx5_core driver handles fw initialization check and command interface revision check. Hence move them inside the mlx5_core driver where it is used. This avoid exposing these helpers to all mlx5 drivers. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
90bf1c8d |
|
07-May-2020 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Move internal timer read function to clock library Move mlx5_read_internal_timer() into lib/clock.c file as it is being used there. As such, make this function a static one. In addition, rearrange headers include to support function move. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
41798df9 |
|
01-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Drain wq first during PCI device removal mlx5_unload_one() is done with cleanup = true only once. So instead of doing health wq drain inside the if(), directly do during PCI device removal. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4162f58b |
|
01-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Have single error unwinding path Having multiple error unwinding path are error prone. Lets have just one error unwinding path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c6168161 |
|
01-Apr-2020 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Add support for release all pages event If FW sets release_all_pages bit in MLX5_EVENT_TYPE_PAGE_REQUEST, driver shall release all pages of a given function id, with no further pages reclaim negotiation with FW nor MANAGE_PAGES commands from driver towards FW. Upon receiving this bit as part of pages reclaim event, driver will initiate release all flow, in which it will iterate and release all function's pages. As part of driver <-> FW capabilities handshake, FW will report release_all_pages max HCA cap bit, and driver will set the release_all_pages bit in HCA cap. NIC: ConnectX-4 Lx CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz Test case: Simulataniously FLR 4 VFs, and measure FW release pages by driver. Before: 3.18 Sec After: 0.31 Sec Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
3ac0e69e |
|
09-Apr-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Update main.c new cmd interface Do mass update of main.c to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
c89da067 |
|
08-Mar-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Read embedded cpu bit only once Embedded CPU bit doesn't change with PCI resume/suspend. Hence read it only once while probing the PCI device. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
59e9e8e4 |
|
13-Jan-2020 |
Mark Zhang <markz@mellanox.com> |
net/mlx5: Enable SW-defined RoCEv2 UDP source port When this is enabled, UDP source port for RoCEv2 packets are defined by software instead of firmware. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
a2a322f4 |
|
19-Mar-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Refactor HCA capability set flow Reduce the amount of kzalloc/kfree cycles by allocating command structure in the parent function and leverage the knowledge that set_caps() is called for HCA capabilities only with specific HW structure as parameter to calculate mailbox size. Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
333fbaa0 |
|
04-Apr-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Move QP logic to mlx5_ib The mlx5_core doesn't need any functionality coded in qp.c, so move that file to drivers/infiniband/ be under mlx5_ib responsibility. Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
f999b706 |
|
08-Mar-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Simplify mlx5_unload_one() and its callers mlx5_unload_one() always returns 0. Simplify callers of mlx5_unload_one() and remove the dead code. Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ecd01db8 |
|
08-Mar-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Simplify mlx5_register_device to return void mlx5_register_device() doesn't check for any error and always returns 0. Simplify mlx5_register_device() to return void and its caller, remove dead code related to it. Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fc6a9f86 |
|
10-Mar-2020 |
Saeed Mahameed <saeedm@mellanox.com> |
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only mkey variant is not required for mlx5_core use, move the mkey variant counter to mlx5_ib. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
12206b17 |
|
11-Feb-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5: Add support for resource dump On driver load: - Initialize resource dump data structure and memory access tools (mkey & pd). - Read the resource dump's menu which contains the FW segment identifier. Each record is identified by the segment name (ASCII). During the driver's course of life, users (like reporters) may request dumps per segment. The user should create a command providing the segment identifier (SW enumeration) and command keys. In return, the user receives a command context. In order to receive the dump, the user should supply the command context and a memory (aligned to a PAGE) on which the dump content will be written. Since the dump may be larger than the given memory, the user may resubmit the command until received an indication of end-of-dump. It is the user's responsibility to destroy the command. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
505a7f54 |
|
12-Dec-2019 |
Meir Lichtinger <meirl@mellanox.com> |
net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-7 device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a6f3b623 |
|
20-Nov-2019 |
Michael Guralnik <michaelgur@mellanox.com> |
net/mlx5: Move devlink registration before interfaces load Register devlink before interfaces are added. This will allow interfaces to use devlink while initalizing. For example, call mlx5_is_roce_enabled. Fixes: aba25279c100 ("net/mlx5e: Add TX reporter support") Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b7eca940 |
|
12-Nov-2019 |
Shani Shapp <shanish@mellanox.com> |
net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-6 LX device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Shani Shapp <shanish@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4383cfcc |
|
27-Oct-2019 |
Michael Guralnik <michaelgur@mellanox.com> |
net/mlx5: Add devlink reload Implement devlink reload for mlx5. Usage example: devlink dev reload pci/0000:06:00.0 Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
32680da7 |
|
12-Sep-2019 |
zhong jiang <zhongjiang@huawei.com> |
net/mlx5: Remove unneeded variable in mlx5_unload_one mlx5_unload_one do not need local variable to store different value, Hence just remove it. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
74bddb36 |
|
09-Oct-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/mlx5: Delete struct mlx5_priv->mkey_table No users are left, delete it. Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d19a79ee |
|
26-Aug-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Add device ID of upcoming BlueField-2 Add the device ID of upcoming BlueField-2 integrated ConnectX-6 Dx network controller. Its VFs will be using the generic VF device ID: 0x101e "ConnectX Family mlx5Gen Virtual Function". Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c9b9dcb4 |
|
29-Aug-2019 |
Ariel Levkovich <lariel@mellanox.com> |
net/mlx5: Move device memory management to mlx5_core Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
00679b63 |
|
19-Aug-2019 |
Michael Guralnik <michaelgur@mellanox.com> |
net/mlx5: Set ODP capabilities for DC transport to max In mlx5_core initialization, query max ODP capabilities for DC transport from FW and set as current capabilities. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
87175120 |
|
21-Aug-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Add HV VHCA infrastructure HV VHCA is a layer which provides PF to VF communication channel based on HyperV PCI config channel. It implements Mellanox's Inter VHCA control communication protocol. The protocol contains control block in order to pass messages between the PF and VF drivers, and data blocks in order to pass actual data. The infrastructure is agent based. Each agent will be responsible of contiguous buffer blocks in the VHCA config space. This infrastructure will bind agents to their blocks, and those agents can only access read/write the buffer blocks assigned to them. Each agent will provide three callbacks (control, invalidate, cleanup). Control will be invoked when block-0 is invalidated with a command that concerns this agent. Invalidate callback will be invoked if one of the blocks assigned to this agent was invalidated. Cleanup will be invoked before the agent is being freed in order to clean all of its open resources or deferred works. Block-0 serves as the control block. All execution commands from the PF will be written by the PF over this block. VF will ack on those by writing on block-0 as well. Its format is described by struct mlx5_hv_vhca_control_block layout. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f818c8a |
|
09-Aug-2019 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand as we don't need to keep the dentries saved anymore. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0000a5f2 |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Make load_one() and unload_one() symmetric Currently mlx5_load_one() perform device registration using mlx5_register_device(). But mlx5_unload_one() doesn't unregister. Make them symmetric by doing device unregistration in mlx5_unload_one(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c778dd31 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5: Accel, Expose accel wrapper for IPsec FPGA function Do not directly call fpga version of IPsec function from main.c. Wrap it by an accel version, and call the wrapper. This will allow deprecating the FPGA IPsec stubs in downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d22663ed |
|
28-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move pci status reg access mutex to mlx5_pci_init mlx5_pci_init() performs pci specific initialization of the mlx5_core_dev struct. Hence move pci_status_mutex to pci initialization routine mlx5_pci_init(). This allows reusing mlx5_mdev_init() to non PCI devices. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
386e75af |
|
28-Jun-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b3bd076f |
|
27-Jan-2019 |
Moshe Shemesh <moshe@mellanox.com> |
net/mlx5: Report devlink health on FW fatal issues Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Call mlx5_enter_error_state() before calling devlink_health_report() to ensure entering device error state even if auto-recovery is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
3e5b72ac |
|
12-Nov-2018 |
Feras Daoud <ferasda@mellanox.com> |
net/mlx5: Issue SW reset on FW assert If a FW assert is considered fatal, indicated by a new bit in the health buffer, reset the FW. After the reset go through the normal recovery flow. Only one PF needs to issue the reset, so an attempt is made to prevent the 2nd function from also issuing the reset. It's not an error if that happens, it just slows recovery. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
63cbc552 |
|
12-Nov-2018 |
Feras Daoud <ferasda@mellanox.com> |
net/mlx5: Handle SW reset of FW in error flow New mlx5 adapters allow the driver to reset the FW in the event of an error, this action called "SW Reset". When an SW reset is issued on any PF all PFs enter reset state which is a recoverable condition. The existing recovery flow was designed to allow the recovery of a VF after a PF driver reload. This patch adds the sw reset to the NIC states as a preparation for sw reset handling. When a software reset is issued the following occurs: 1. The NIC interface mode is set to 7 while the reset is in progress. 2. Once the reset completes the NIC interface mode is set to 1. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8b9d8baa |
|
17-Jul-2018 |
Alex Vesker <valex@mellanox.com> |
net/mlx5: Add Crdump support Crdump allows the driver to retrieve a dump of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The crspace dump can be used for later debug. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b25bbc2f |
|
28-Jun-2018 |
Alex Vesker <valex@mellanox.com> |
net/mlx5: Add Vendor Specific Capability access gateway The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1f28d776 |
|
11-Dec-2018 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Move all devlink related functions calls to devlink.c Centralize all devlink related callbacks in one file. In the downstream patch, some more functionality will be added, this patch is preparing the driver infrastructure for it. Currently, move devlink un/register functions calls into this file. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e1706e62 |
|
10-Jun-2019 |
Yuval Avnery <yuvalav@mellanox.com> |
net/mlx5: Separate IRQ table creation from EQ table creation IRQ allocation should be part of the IRQ table life-cycle. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
561aa15a |
|
10-Jun-2019 |
Yuval Avnery <yuvalav@mellanox.com> |
net/mlx5: Separate IRQ data from EQ table data IRQ table should only exist for mlx5_core_dev for PF and VF only. EQ table of mediated devices should hold a pointer to the IRQ table of the parent PCI device. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
86eec50b |
|
10-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Support querying max VFs from device For ECPF with eswitch manager privilege, query the host max VF count by querying the device using query_functions command. With this enhancement: 1. flow steering entries are created only for valid vports based on the max VF count of the PF. 2. Driver only queries cap of valid vport. Eswitch requires the max VFs when doing initialization, so do sr-iov init before eswitch init. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b8a92577 |
|
10-Jun-2019 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Increase wait time for fw initialization Firmware FLR happens sequentially, in some cases, like when destroying a VM that had many VFs, may require waiting much longer than 10 seconds. Increase the timeout to 2 minutes, and print a wait countdown status every 20 seconds. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
44f18db5 |
|
04-Jun-2019 |
Jiri Pirko <jiri@mellanox.com> |
mlxfw: Propagate error messages through extack Currently the error messages are printed to dmesg. Propagate them also to directly to user doing the flashing through extack. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9c8bca26 |
|
04-Jun-2019 |
Jiri Pirko <jiri@mellanox.com> |
mlx5: Move firmware flash implementation to devlink Benefit from the devlink flash update implementation and ethtool fallback to it and move firmware flash implementation there. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ccc171e |
|
30-Jan-2019 |
Yevgeny Kliteynik <kliteyn@mellanox.com> |
net/mlx5: Geneve, Manage Geneve TLV options Use Geneve TLV Options object to manage the flex parser matching on the 32-bit options data. When the first flow with a certain class/type values is requested to be offloaded, create a FW object with FW command (Geneve TLV Options general object) and start counting the number of flows using this object. During this time, any request with a different class/type values will fail to be offloaded. Once the refcount reaches 0, destroy the TLV options general object, and can now offload a flow with any class/type parameters. Geneve TLV Options object is added to core device. It is currently used to manage Geneve TLV options general object allocation in FW and its reference counting only. In the future it will also be used for managing geneve ports by registering callbacks for ndo_udp_tunnel_add/del. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
87883929 |
|
16-May-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Fix error handling in mlx5_load() In case mlx5_core_set_hca_defaults fails, it should jump to mlx5_cleanup_fs, fix that. Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
27b942fb |
|
29-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Get rid of storing copy of device name Currently mlx5 core stores copy of the PCI device name in a mlx5_priv structure and uses pr_warn, pr_err helpers. Get rid of the copy of this name; instead store the parent device pointer that contains name as well as dma specific parameters. This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg device specific print routines. This is also a preparation patch to access non PCI parent device in future. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
aa8106f1 |
|
29-Mar-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Add explicit bar address field Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
98a8e6fc |
|
29-Mar-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info Replace pci dev_err/warn/info messages with mlx5_core_err/warn/info messages to provide a better report/debug of different mlx5 device types. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a80d1b68 |
|
29-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Break load_one into three stages Using foundation from previous patches to factor mlx5_load_one flow into three stages: 1. mlx5_function_setup() from previous patch to setup function 2. mlx5_init_once() from previous patch to init software objects according to hw caps 3. New mlx5_load() to load mlx5 components This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e161105e |
|
29-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Function setup/teardown procedures Function setup and teardown procedures are the basic procedure that each mlx5 pci function should perform to boot up a mlx5 device function and initialize basic communication with FW, before allocating any higher level software/firmware resources. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
52c368dc |
|
29-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Move health and page alloc init to mdev_init Software structure initialization should be in mdev_init stage. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
11f3b84d |
|
29-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Split mdev init and pci init Separate resources initialization from pci initialization. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
868bc06b |
|
29-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Remove redundant init functions parameter This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e8b26b21 |
|
19-Mar-2019 |
Artemy Kovalyov <artemyko@mellanox.com> |
net/mlx5: Decrease default mr cache size Delete initialization of high order entries in mr cache to decrease initial memory footprint. When required, the administrator can populate the entries with memory keys via the /sys interface. This approach is very helpful to significantly reduce the per HW function memory footprint in virtualization environments such as SRIOV. Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reported-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fca22e7e |
|
24-Feb-2019 |
Moni Shoua <monis@mellanox.com> |
net/mlx5: ODP support for XRC transport is not enabled by default in FW ODP support for XRC transport is not enabled by default in FW, so we need separate ODP checks to enable/disable it. While that, rewrite the set of ODP SRQ support capabilities in way that tests each field separately for clearness, which is not needed for current FW, but better to have it separated. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
85327a9c |
|
27-Jan-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-6 Dx. In addition, add "ConnectX Family mlx5Gen Virtual Function" device ID. Every new HCA VF will be identified with this device ID. Different VF models will be distinguished by their revision id. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
37b6bb77 |
|
17-Feb-2019 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Factor out HCA capabilities functions Combine all HCA capabilities setters under one function and compile out the ODP related function in case kernel was compiled without ODP support. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
22e939a9 |
|
12-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Update enable HCA dependency With the introduction of ECPF, we require that the ECPF driver will aways call enable/disable HCA for that PF in the same way a PF does this for its VFs. The PF is still responsible for calling enable and disable HCA for its VFs. To distinguish between the ECPF executing enable/disable HCA for itself or for the PF, it sets the embedded CPU function bit in the input params struct of these commands. When the bit is cleared and function ID is zero, it refers to the peer PF. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
591905ba |
|
12-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Introduce Mellanox SmartNIC and modify page management logic Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power with advanced network offloads to accelerate a multitude of security, networking and storage applications. With the introduction of the SmartNIC, there is a new PCI function called Embedded CPU Physical Function(ECPF). And it's possible for a PF to get its ICM pages from the ECPF PCI function. Driver shall identify if it is running on such a function by reading a bit in the initialization segment. When firmware asks for pages, it would issue a page request event specifying how many pages it requests and for which function. That driver responds with a manage_pages command providing the requested pages along with an indication for which function it is providing these pages. The encoding before this patch was as follows: function_id == 0: pages are requested for the function receiving the EQE. function_id != 0: pages are requested for VF identified by the function_id value A new one bit field in the EQE identifies that pages are requested for the ECPF. The notion of page_supplier can be introduced here and to support that, manage pages and query pages were modified so firmware can distinguish the following cases: 1. Function provides pages for itself 2. PF provides pages for its VF 3. ECPF provides pages to itself 4. ECPF provides pages for another function This distinction is possible through the introduction of the bit "embedded_cpu_function" in query_pages, manage_pages and page request EQE. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
224d71cc |
|
11-Feb-2019 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Align ODP capability function with netdev coding style Update newly introduced function to be aligned to netdev coding style. Fixes: 46861e3e88be ("net/mlx5: Set ODP SRQ support in firmware") Reported-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
46861e3e |
|
21-Jan-2019 |
Moni Shoua <monis@mellanox.com> |
net/mlx5: Set ODP SRQ support in firmware To avoid compatibility issue with older kernels the firmware doesn't allow SRQ to work with ODP unless kernel asks for it. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
ce4eee53 |
|
18-Jan-2019 |
Michael Guralnik <michaelgur@mellanox.com> |
net/mlx5: Add pci AtomicOps request Calling pci_enable_atomic_ops_to_root enables AtomicOp requests to pci root port. AtomicOp requests will be enabled only if the completer and all intermediate pci bridges support PCI atomic operations. This, together with appropriate settings in the NVCONFIG should enable PCI atomic operations on the device. PCI atomic operations were first introduced in PCI Express Base Specification 2.1. The Supported operations are Swap (Unconditional Swap), CAS (Compare and Swap) and FetchAdd (Fetch and Add). Unlike other atomic operation modes PCI atomic operations gives the user the option to do atomic operations on local memory, without involving verbs api, without it compromising the operation's atomicity. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
4a0475d5 |
|
03-Dec-2018 |
Miroslav Lichvar <mlichvar@redhat.com> |
mlx5: extend PTP gettime function to read system clock Read the system time right before and immediately after reading the low register of the internal timer. This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
199fa087 |
|
13-Dec-2018 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Continue driver initialization despite debugfs failure The failure to create debugfs entry is unpleasant event, but not enough to abort drier initialization. Align the mlx5_core code to debugfs design and continue execution whenever debugfs_create_dir() successes or not. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fadd59fc |
|
04-Dec-2018 |
Aviv Heller <avivh@mellanox.com> |
net/mlx5: Introduce inter-device communication mechanism This introduces devcom, a generic mechanism for performing operations on both physical functions of the same Connect-X card. The first user of this API is merged eswitch, which will be introduced in subsequent patches. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f3da6577 |
|
28-Nov-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/mlx5: Initialize SRQ tables on mlx5_ib Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
02039fb6 |
|
26-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Remove unused events callback and logic The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore. We totally remove the delayed events logic work around, since with the dynamic notifier registration API it is not needed anymore, mlx5_ib can register its notifier and start receiving events exactly at the moment it is ready to handle them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
69c1280b |
|
20-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Device events, Use async events chain Move all the generic async events handling into new specific events handling file events.c to keep eq.c file clean from concrete event logic handling. Use new API to register for NOTIFY_ANY to handle generic events and dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0cf53c12 |
|
20-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: FWPage, Use async events chain Remove the explicit call to mlx5_core_req_pages_handler on MLX5_EVENT_TYPE_PAGE_REQUEST and let FW page logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d5d284b8 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA Use the new generic EQ API to move all ODP RDMA data structures and logic form mlx5 core driver into mlx5_ib driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
16d76083 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Different EQ types In mlx5 we have three types of usages for EQs, 1. Asynchronous EQs, used internally by mlx5 core for a. FW command completions b. FW page requests c. one EQ for all other Asynchronous events 2. Completion EQs, used for CQ completion (we create one per core) 3. *Special type of EQ (page fault) used for RDMA on demand paging (ODP). *The 3rd type shouldn't be special at least in mlx5 core, it is yet another async events EQ with specific use case, it will be removed in the next two patches, and will completely move its logic to mlx5_ib, as it is rdma specific. In this patch we remove use case (eq type) specific fields from struct mlx5_eq into a new eq type specific structures. struct mlx5_eq_async; truct mlx5_eq_comp; struct mlx5_eq_pagefault; Separate between their type specific flows. In the future we will allow users to create there own generic EQs. for now we will allow only one for ODP in next patches. We will introduce event listeners registration API for those who want to receive mlx5 async events. After that mlx5 eq handling will be clean from feature/user specific handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
f2f3df55 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Privatize eq_table and friends Move unnecessary EQ table structures and declaration from the public include/linux/mlx5/driver.h into the private area of mlx5_core and into eq.c/eq.h. Introduce new mlx5 EQ APIs: mlx5_comp_vectors_count(dev); mlx5_comp_irq_get_affinity_mask(dev, vector); And use them from mlx5_ib or mlx5e netdevice instead of direct access to mlx5_core internal structures. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
c8e21b3b |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Create all EQs in one place Instead of creating the EQ table in three steps at driver load, - allocate irq vectors - allocate async EQs - allocate completion EQs Gather all of the procedures into one function in eq.c and call it from driver load. This will help us reduce the EQ and EQ table private structures visibility to eq.c in downstream refactoring. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
ca828cb4 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Move all EQ logic to eq.c Move completion EQs flows from main.c to eq.c, reasons: 1) It is where this logic belongs. 2) It will help centralize the EQ logic in one file for downstream refactoring, and future extensions/updates. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
aaa553a6 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Remove redundant completion EQ list lock Completion EQs list is only modified on driver load/unload, locking is not required, remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
2883f352 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, No need to store eq index as a field eq->index is used only for completion EQs and is assigned to be the completion eq index, it is used only when traversing the completion eqs list, and it can be calculated dynamically, thus remove the eq->index field. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
1e86ace4 |
|
19-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: EQ, Use the right place to store/read IRQ affinity hint Currently the cpu affinity hint mask for completion EQs is stored and read from the wrong place, since reading and storing is done from the same index, there is no actual issue with that, but internal irq_info for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info array, this patch changes the code to use the correct offset to store and read the IRQ affinity hint. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
fcd29ad1 |
|
09-Aug-2018 |
Feras Daoud <ferasda@mellanox.com> |
net/mlx5: Add Fast teardown support Today mlx5 devices support two teardown modes: 1- Regular teardown 2- Force teardown This change introduces the enhanced version of the "Force teardown" that allows SW to perform teardown in a faster way without the need to reclaim all the pages. Fast teardown provides the following advantages: 1- Fix a FW race condition that could cause command timeout 2- Avoid moving to polling mode 3- Close the vport to prevent PCI ACK to be sent without been scatter to memory Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5df816e7 |
|
07-Aug-2018 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
net/mlx5: Fix debugfs cleanup in the device init/remove flow When initializing the device (procedure init_one), the driver calls mlx5_pci_init to perform pci initialization. As part of this initialization, mlx5_pci_init creates a debugfs directory. If this creation fails, init_one aborts, returning failure to the caller (which is the probe method caller). The main reason for such a failure to occur is if the debugfs directory already exists. This can happen if the last time mlx5_pci_close was called, debugfs_remove (silently) failed due to the debugfs directory not being empty. Guarantee that such a debugfs_remove failure will not occur by instead calling debugfs_remove_recursive in procedure mlx5_pci_close. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
76d5581c |
|
05-Aug-2018 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
net/mlx5: Fix use-after-free in self-healing flow When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
342ac844 |
|
08-Aug-2018 |
Denis Drozdov <denisd@mellanox.com> |
net/mlx5: Use max_num_eqs for calculation of required MSIX vectors New firmware has defined new HCA capability field called "max_num_eqs", that is the number of available EQs after subtracting reserved FW EQs. Before this capability the FW reported the EQ number in "log_max_eqs", the reported value also contained FW reserved EQs, but the driver might be failing to load on 320 cpus systems due to the fact that FW reserved EQs were not available to the driver. Now the driver has to obtain max_num_eqs value from new FW to get real number of EQs available. Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
358aa5ce |
|
09-May-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Vxlan, move vxlan logic to core driver Move vxlan logic and objects to mlx5 core dirver. Since it going to be used from different mlx5 interfaces. e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
|
#
24406953 |
|
22-Feb-2018 |
Feras Daoud <ferasda@mellanox.com> |
net/mlx5: FW tracer, Enable tracing Add the tracer file to the makefile and add the init function to the load one flow. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
048f3143 |
|
16-Jul-2018 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Fix tristate and description for MLX5 module Current description did not include new devices. Fix that by proving the correct generic description. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1ef903bf |
|
26-Mar-2018 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Free IRQs in shutdown path Some platforms require IRQs to be free'd in the shutdown path. Otherwise they will fail to be reallocated after a kexec. Fixes: 8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1ae17322 |
|
30-Apr-2018 |
Ilya Lesokhin <ilyal@mellanox.com> |
net/mlx5: Accel, Add TLS tx offload interface Add routines for manipulating TLS TX offload contexts. In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova TLS (FPGA-based) hardware. These routines will be used by the TLS offload support in a later patch mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00c6bcb0 |
|
30-Mar-2018 |
Tal Gilboa <talgi@mellanox.com> |
net/mlx5: Report PCIe link properties with pcie_print_link_status() Use pcie_print_link_status() to report PCIe link speed and possible limitations. Signed-off-by: Tal Gilboa <talgi@mellanox.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
|
#
05564d0a |
|
18-Feb-2018 |
Aviad Yehezkel <aviadye@mellanox.com> |
net/mlx5: Add flow-steering commands for FPGA IPSec implementation In order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
04e87170 |
|
19-Nov-2017 |
Matan Barak <matanb@mellanox.com> |
net/mlx5: FPGA and IPSec initialization to be before flow steering Some flow steering namespace initialization (i.e. egress namespace) might depend on FPGA capabilities. Changing the initialization order such that the FPGA will be initialized before flow steering. Flow steering fs cmds initialization might depend on IPSec capabilities. Changing the initialization order such that the IPSec will be initialized before flow steering as well. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c67f100e |
|
02-Feb-2018 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Use 128B cacheline size for 128B or larger cachelines The adapter uses the cache_line_128byte setting to set the bounds for end padding. On systems where the cacheline size is greater than 128B use 128B instead of the default of 64B. This results in fewer partial cacheline writes. There's a 50% chance it will pad to the end of a 256B cache line vs only 25% when using 64B. Fixes: f32f5bd2eb7e ("net/mlx5: Configure cache line size for start and end padding") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
02d92f79 |
|
19-Jan-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: CQ Database per EQ Before this patch the driver had one CQ database protected via one spinlock, this spinlock is meant to synchronize between CQ adding/removing and CQ IRQ interrupt handling. On a system with large number of CPUs and on a work load that requires lots of interrupts, this global spinlock becomes a very nasty hotspot and introduces a contention between the active cores, which will significantly hurt performance and becomes a bottleneck that prevents seamless cpu scaling. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. Tested with: system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M WITHOUT THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41 Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271 Overhead Command Shared Object Symbol + 14.28% swapper [kernel.vmlinux] [k] intel_idle + 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath + 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit WITH THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69 Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483 Overhead Command Shared Object Symbol + 23.33% swapper [kernel.vmlinux] [k] intel_idle + 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
|
#
259bbc57 |
|
31-Dec-2017 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Fix error handling in load one We didn't store the result of mlx5_init_once, due to that mlx5_load_one returned success on error. Fix that. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
72f36be0 |
|
20-Nov-2017 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5: Fix mlx5_get_uars_page to return error code Change mlx5_get_uars_page to return ERR_PTR in case of allocation failure. Change all callers accordingly to check the IS_ERR(ptr) instead of NULL. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b6908c29 |
|
14-Dec-2017 |
Alaa Hleihel <alaa@mellanox.com> |
net/mlx5: Fix memory leak in bad flow of mlx5_alloc_irq_vectors Fix a memory leak where in case that pci_alloc_irq_vectors failed, priv->irq_info was not released. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8978cc92 |
|
09-Jan-2018 |
Eran Ben Elisha <eranbe@mellanox.com> |
{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed There are systems platform information management interfaces (such as HOST2BMC) for which we cannot disable local loopback multicast traffic. Separate disable_local_lb_mc and disable_local_lb_uc capability bits so driver will not disable multicast loopback traffic if not supported. (It is expected that Firmware will not set disable_local_lb_mc if HOST2BMC is running for example.) Function mlx5_nic_vport_update_local_lb will do best effort to disable/enable UC/MC loopback traffic and return success only in case it succeeded to changed all allowed by Firmware. Adapt mlx5_ib and mlx5e to support the new cap bits. Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest") Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c4b76d8d |
|
04-Jan-2018 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Set num_vhca_ports capability Set the current capability to the max capability. Doing so enables dual port RoCE functionality if supported by the firmware. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
8737f818 |
|
04-Jan-2018 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Set software owner ID during init HCA Generate a unique 128bit identifier for each host and pass that value to firmware in the INIT_HCA command if it reports the sw_owner_id capability. Each device bound to the mlx5_core driver will have the same software owner ID. In subsequent patches mlx5_core devices will be bound via a new VPort command so that they can operate together under a single InfiniBand device. Only devices that have the same software owner ID can be bound, to prevent traffic intended for one host arriving at another. The INIT_HCA command length was expanded by 128 bits. The command length is provided as an input FW commands. Older FW does not have a problem receiving this command in the new longer form. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
dd44572a |
|
02-Jan-2018 |
Moni Shoua <monis@mellanox.com> |
net/mlx5: Enable DC transport Enable DC transport in the firmware to provide its functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
231243c8 |
|
09-Nov-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
Revert "mlx5: move affinity hints assignments to generic code" Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by: Jes Sorensen <jsorensen@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d2aa060d |
|
26-Sep-2017 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Cancel health poll before sending panic teardown command After the panic teardown firmware command, health_care detects the error in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is no longer needed because the panic teardown firmware command will bring down the PCI bus communication with the HCA. The solution is to cancel the health care timer and its pending workqueue request before sending panic teardown firmware command. Kernel trace: mlx5_core 0033:01:00.0: Shutdown was called mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1 mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0 mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end Unable to handle kernel paging request for data at address 0x0000003f Faulting instruction address: 0xc0080000434b8c80 Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow') Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7c39afb3 |
|
15-Aug-2017 |
Feras Daoud <ferasda@mellanox.com> |
net/mlx5: PTP code migration to driver core section PTP code is moved to core section of mlx5 driver in order to share it between ethernet and infiniband. This movement involves the following changes: - Change mlx5e_ prefix to be mlx5_ - Add clock structs to Core - Add clock object to mlx5_core_dev - Call Init/Uninit clock from core init/cleanup - Rename mlx5e_tstamp to be mlx5_clock Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10a8d007 |
|
09-Aug-2017 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWN MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b3cb5388 |
|
08-Aug-2017 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Skip mlx5_unload_one if mlx5_load_one fails There is an issue where the firmware fails during mlx5_load_one, the health_care timer detects the issue and schedules a health_care call. Then the mlx5_load_one detects the issue, cleans up and quits. Then the health_care starts and calls mlx5_unload_one to clean up the resources that no longer exist and causes kernel panic. The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set after mlx5_load_one fails. The solution is removing the bit MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN is redundant and we can use MLX5_INTERFACE_STATE_UP instead. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
12148f5a |
|
11-Jul-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Avoid using multiple blank lines To fix these checkpatch complaints: CHECK: Please don't use multiple blank lines Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
26d15948 |
|
15-Aug-2017 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
mlx5: remove unnecessary pci_set_drvdata() The driver core clears the driver data to NULL after device_release or on probe failure. Thus, it is not necessary to manually clear the device driver data to NULL. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a435393a |
|
13-Jul-2017 |
Sagi Grimberg <sagi@grimberg.me> |
mlx5: move affinity hints assignments to generic code generic api takes care of spreading affinity similar to what mlx5 open coded (and even handles better asymmetric configurations). Ask the generic API to spread affinity for us, and feed him pre_vectors that do not participate in affinity settings (which is an improvement to what we had before). The affinity assignments should match what mlx5 tried to do earlier but now we do not set affinity to async, cmd and pages dedicated vectors. Also, remove mlx5e_get_cpu and introduce mlx5e_get_node (used for allocation purposes) and mlx5_get_vector_affinity (for indirection table construction) as they provide the needed information. Luckily, we have generic helpers to get cpumask and node given a irq vector. mlx5_get_vector_affinity will be used by mlx5_ib in a subsequent patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
78249c42 |
|
13-Jul-2017 |
Sagi Grimberg <sagi@grimberg.me> |
mlx5: convert to generic pci_alloc_irq_vectors Now that we have a generic code to allocate an array of irq vectors and even correctly spread their affinity, correctly handle cpu hotplug events and more, were much better off using it. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
97834eba |
|
06-Jun-2017 |
Erez Shitrit <erezsh@mellanox.com> |
net/mlx5: Delay events till ib registration ends When mlx5_ib registers itself to mlx5_core as an interface, it will call mlx5_add_device which will call mlx5_ib interface add callback, in case the latter successfully returns, only then mlx5_core will add it to the interface list and async events will be forwarded to mlx5_ib. Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib interface to its devices list, arriving mlx5_core events can be missed by the new mlx5_ib registering interface. In other words: thread 1: mlx5_ib: mlx5_register_interface(dev) thread 1: mlx5_core: mlx5_add_device(dev) thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add thread 2: mlx5_core_event: **new event arrives, forward to dev_list thread 1: mlx5_core: add_ctx_to_dev_list(ctx) /* previous event was missed by the new interface.*/ It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device but not after. We fix this race by accumulating the events that come between the ib_register_device (inside mlx5_add_device->(dev->add)) till the adding to the list completes and fire them to the new registering interface after that. Fixes: f1ee87fe55c8 ("net/mlx5: Organize device list API in one place") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e80541ec |
|
05-Jun-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig Allow to selectively build the driver with or without sriov eswitch, VF representors and TC offloads. Also remove the need of two ndo ops structures (sriov & basic) and keep only one unified ndo ops, compile out VF SRIOV ndos when not needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result in returning -EPERM. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
|
#
eeb66cdb |
|
04-Jun-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Separate between E-Switch and MPFS Multi-Physical Function Switch (MPFs) is required for when multi-PF configuration is enabled to allow passing user configured unicast MAC addresses to the requesting PF. Before this patch eswitch.c used to manage the HW MPFS l2 table, E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's contexts update on unicast mac address list changes, to populate the PF's MPFS L2 table accordingly. In downstream patch we would like to allow compiling the driver without E-Switch functionalities, for that we move MPFS l2 table logic out of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to allow compiling out MPFS for those who don't want Multi-PF support. NIC PF netdevice will now directly update MPFS l2 table via the new MPFS API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain responsible of updating its MPFS l2 table on behalf of its VFs. Due to this change we also don't require enabling vport(0) (PF vport) unicast mac changes events anymore, for when SRIOV is not enabled. Which means E-Switch is now activated only on SRIOV activation, and not required otherwise. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
|
#
c85023e1 |
|
30-May-2017 |
Huy Nguyen <huyn@mellanox.com> |
IB/mlx5: Add raw ethernet local loopback support Currently, unicast/multicast loopback raw ethernet (non-RDMA) packets are sent back to the vport. A unicast loopback packet is the packet with destination MAC address the same as the source MAC address. For multicast, the destination MAC address is in the vport's multicast filter list. Moreover, the local loopback is not needed if there is one or none user space context. After this patch, the raw ethernet unicast and multicast local loopback are disabled by default. When there is more than one user space context, the local loopback is enabled. Note that when local loopback is disabled, raw ethernet packets are not looped back to the vport and are forwarded to the next routing level (eswitch, or multihost switch, or out to the wire depending on the configuration). Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bebb23e6 |
|
25-Apr-2017 |
Ilan Tayari <ilant@mellanox.com> |
net/mlx5: Accel, Add IPSec acceleration interface Add routines for manipulating the hardware IPSec SA database (SADB). In Innova IPSec, a Security Association (SA) is added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova IPSec (FPGA-based) hardware. These routines will be used by the IPSec offload support in a later patch However they may also be used by others such as RDMA and RoCE IPSec. mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In this patchset we add Innova IPSec support and mlx5/accel delegates IPSec offloads to Innova routines. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9410733c |
|
14-Jun-2017 |
Ilan Tayari <ilant@mellanox.com> |
net/mlx5: FPGA, Move FPGA init/cleanup to init_once The FPGA init and cleanup routines should be called just once per device. Move them to the init_once and cleanup_once routines. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
52ec462e |
|
26-Mar-2017 |
Ilan Tayari <ilant@mellanox.com> |
net/mlx5: Add reserved-gids support Reserved GIDs are entries in the GID table in use by the mlx5_core and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev). The entries are reserved at the high indexes of the GID table. A mlx5 submodule may reserve a certain amount of GIDs for its own use during the load sequence by calling mlx5_core_reserve_gids, and must also take care to un-reserve these GIDs when it closes. Reservation is only allowed during the load sequence and before any interfaces (e.g. mlx5_ib or mlx5_en) are up. After reservation, a submodule may call mlx5_core_reserved_gid_alloc/ free to allocate entries from the reserved GIDs pool. Reserve a GID table entry for every supported FPGA QP. A later patch in the patchset will remove them from being reported to IB core. Another such patch will make use of these for FPGA QPs in Innova NIC. Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to expose only public mlx5 API, more mlx5 library files will be added in future submissions. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9ade8c7c |
|
24-May-2017 |
Ilan Tayari <ilant@mellanox.com> |
net/mlx5: Set interface flags before cleanup in unload_one In load_one, the interface flags are changed from down to up, only after initializing the interfaces. In unload_one, the flags are changed from up to down before the interface cleanup. Change the cleanup order to be opposite to initialization order. This fixes flag consistency between init and cleanup. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2a0165a0 |
|
30-Mar-2017 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Cancel delayed recovery work when unloading the driver Draining the health workqueue will ignore future health works including the one that report hardware failure and thus we can't enter error state Instead cancel the recovery flow and make sure only recovery flow won't be scheduled. Fixes: 5e44fca50470 ('net/mlx5: Only cancel recovery work when cleaning up device') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8ce59b16 |
|
19-Jun-2017 |
Gal Pressman <galp@mellanox.com> |
net/mlx5: Fix driver load error flow when firmware is stuck When wait for firmware init fails, previous code would mistakenly return success and cause inconsistency in the driver state. Fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
bbad7c21 |
|
20-Jun-2017 |
Myron Stowe <myron.stowe@redhat.com> |
net/mlx5e: Use device ID defines Use Mellanox device ID definitions in the driver's mlx5 ID table so tools such as 'grep' and 'cscope' can be used to help find correlated material (such as INTx Masking quirks: d76d2fe05fd PCI: Convert Mellanox broken INTx quirks to be for listed devices only). No functional change intended. Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8812c24d |
|
09-Feb-2017 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: Add fast unload support in shutdown flow Adding a support to flush all HW resources with one FW command and skip all the heavy unload flows of the driver on kernel shutdown. There's no need to free all the SW context since a new fresh kernel will be loaded afterwards. Regarding the FW resources, they should be closed, otherwise we will have leakage in the FW. To accelerate this flow, we execute one command in the beginning that tells the FW that the driver isn't going to close any of the FW resources and asks the FW to clean up everything. Once the commands complete, it's safe to close the PCI resources and finish the routine. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1b9f533a |
|
28-May-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Avoid using multiple blank lines Fixed bunch of this checkpatch complaint: CHECK: Please don't use multiple blank lines Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
bd10838a |
|
28-May-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Fix some spelling mistakes Fixed few places where endianness was misspelled and one spot whwere output was: CHECK: 'endianess' may be misspelled - perhaps 'endianness'? CHECK: 'ouput' may be misspelled - perhaps 'output'? Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6c780a02 |
|
08-Jun-2017 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Wait for FW readiness before initializing command interface Before attempting to initialize the command interface we must wait till the fw_initializing bit is clear. If we fail to meet this condition the hardware will drop our configuration, specifically the descriptors page address. This scenario can happen when the firmware is still executing an FLR flow and did not finish yet so the driver needs to wait for that to finish. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
91828bd8 |
|
28-May-2017 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: Enable 4K UAR only when page size is bigger than 4K When the page size isn't bigger than 4K, there is no added value of enabling 4K UAR feature in the Firmware. Modified the condition of enabling the 4K UAR accordingly. Fixes: f502d834950a ("net/mlx5: Activate support for 4K UARs") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f0d7ae95 |
|
29-May-2017 |
Arnd Bergmann <arnd@arndb.de> |
net/mlx5: avoid build warning for uniprocessor Building the driver with CONFIG_SMP disabled results in a harmless warning: ethernet/mellanox/mlx5/core/main.c: In function 'mlx5_irq_set_affinity_hint': ethernet/mellanox/mlx5/core/main.c:615:6: error: unused variable 'irq' [-Werror=unused-variable] It's better to express the conditional compilation using IS_ENABLED() here, as that lets the compiler see what the intented use for the variable is, and that it can be silently discarded. Fixes: b665d98edc9a ("net/mlx5: Tolerate irq_set_affinity_hint() failures") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b665d98e |
|
18-May-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5: Tolerate irq_set_affinity_hint() failures Add tolerance to failures of irq_set_affinity_hint(). Its role is to give hints that optimizes performance, and should not block the driver load. In non-SMP systems, functionality is not available as there is a single core, and all these calls definitely fail. Hence, do not call the function and avoid the warning prints. Fixes: db058a186f98 ("net/mlx5_core: Set irq affinity hints") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e29341fb |
|
13-Mar-2017 |
Ilan Tayari <ilant@mellanox.com> |
net/mlx5: FPGA, Add basic support for Innova Mellanox Innova is a NIC with ConnectX and an FPGA on the same board. The FPGA is a bump-on-the-wire and thus affects operation of the mlx5_core driver on the ConnectX ASIC. Add basic support for Innova in mlx5_core. This allows using the Innova card as a regular NIC, by detecting the FPGA capability bit, and verifying its load state before initializing ConnectX interfaces. Also detect FPGA fatal runtime failures and enter error state if they ever happen. All new FPGA-related logic is placed in its own subdirectory 'fpga', which may be built by selecting CONFIG_MLX5_FPGA. This prepares for further support of various Innova features in later patchsets. Additional details about hardware architecture will be provided as more features get submitted. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2e9d3e83 |
|
19-Apr-2017 |
Noa Osherovich <noaos@mellanox.com> |
net/mlx5: Update the list of the PCI supported devices Add the BlueField device and VF IDs to the supported devices list. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
55378a23 |
|
30-Mar-2017 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Fix driver load bad flow when having fw initializing timeout If FW is stuck in initializing state we will skip the driver load, but current error handling flow doesn't clean previously allocated command interface resources. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7768d197 |
|
25-Sep-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for encapsulation Implement the devlink e-switch encapsulation control set and get callbacks. Apply the value set by the user on the switchdev offloads mode when creating the fast FDB table where offloaded rules will be set. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d0dd989f |
|
23-Feb-2017 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: Update the list of the PCI supported devices Rename the ConnectX-5 PCIe 4.0 to be ConnectX-5 Ex. Also add the upcoming ConnectX-6 and it's VF IDs to the list. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5f40b4ed |
|
21-Mar-2017 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Increase number of max QPs in default profile With ConnectX-4 sharing SRQs from the same space as QPs, we hit a limit preventing some applications to allocate needed QPs amount. Double the size to 256K. Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d47f6c8 |
|
10-Mar-2017 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Don't save PCI state when PCI error is detected When a PCI error is detected the PCI state could be corrupt, don't save it in that flow. Save the state after initialization. After restoring the PCI state during slot reset save it again, restoring the state destroys the previously saved state info. Fixes: 05ac2c0b7438 ('net/mlx5: Fix race between PCI error handlers and health work') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f32f5bd2 |
|
19-Nov-2015 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Configure cache line size for start and end padding There is a hardware feature that will pad the start or end of a DMA to be cache line aligned to avoid RMWs on the last cache line. The default cache line size setting for this feature is 64B. This change configures the hardware to use 128B alignment on systems with 128B cache lines. In addition we lower bound MPWRQ stride by HCA cacheline in mlx5e, MPWRQ stride should be at least the HCA cacheline, the current default is 64B and in case HCA_CAP.cach_line_128byte capability is set, MPWRQ RX stride will automatically be aligned to 128B. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9eb78923 |
|
11-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Change ENOTSUPP to EOPNOTSUPP As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx5 driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
701052c5 |
|
14-Dec-2016 |
Gal Pressman <galp@mellanox.com> |
net/mlx5: Move cached hca caps to designated caps struct The caps structure consists of hca caps and port/management caps, all under one roof. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f82eed45 |
|
01-Dec-2016 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Remove information print after attempt to load mlx5_ib module Infiniband part of mlx5 driver can be compiled as a module or as a part of bzImage (compiled in). In the second case, the call to request_module will return an error -ENOENT. It will cause to a misleading print "failed request module on mlx5_ib". This patch removes this print, In order to comply with mlx4. Fixes: f66f049fb738 ("net/mlx5_core: Request the mlx5 IB module on driver load") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5e44fca5 |
|
10-Jan-2017 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Only cancel recovery work when cleaning up device Do not attempt to drain the health workqueue when unloading the device in the recovery flow, this can cause a deadlock when the recovery work tries to cancel itself with sync. Because the work is no longer unconditionally canceled when unloading, it must be explicitly canceled in the AER flow. fixes: 689a248df83b ("net/mlx5: Cancel recovery work in remove flow") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f502d834 |
|
03-Jan-2017 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Activate support for 4K UARs Activate 4K UAR support for firmware versions that support it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5fe9dec0 |
|
03-Jan-2017 |
Eli Cohen <eli@mellanox.com> |
IB/mlx5: Use blue flame register allocator in mlx5_ib Make use of the blue flame registers allocator at mlx5_ib. Since blue flame was not really supported we remove all the code that is related to blue flame and we let all consumers to use the same blue flame register. Once blue flame is supported we will add the code. As part of this patch we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only used by mlx5_ib. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
01187175 |
|
03-Jan-2017 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Add interface to get reference to a UAR A reference to a UAR is required to generate CQ or EQ doorbells. Since CQ or EQ doorbells can all be generated using the same UAR area without any effect on performance, we are just getting a reference to any available UAR, If one is not available we allocate it but we don't waste the blue flame registers it can provide and we will use them for subsequent allocations. We get a reference to such UAR and put in mlx5_priv so any kernel consumer can make use of it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2f5ff264 |
|
03-Jan-2017 |
Eli Cohen <eli@mellanox.com> |
mlx5: Fix naming convention with respect to UARs This establishes a solid naming conventions for UARs. A UAR (User Access Region) can have size identical to a system page or can be fixed 4KB depending on a value queried by firmware. Each UAR always has 4 blue flame register which are used to post doorbell to send queue. In addition, a UAR has section used for posting doorbells to CQs or EQs. In this patch we change names to reflect this conventions. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d9aaed83 |
|
02-Jan-2017 |
Artemy Kovalyov <artemyko@mellanox.com> |
{net,IB}/mlx5: Refactor page fault handling * Update page fault event according to last specification. * Separate code path for page fault EQ, completion EQ and async EQ. * Move page fault handling work queue from mlx5_ib static variable into mlx5_core page fault EQ. * Allocate memory to store ODP event dynamically as the events arrive, since in atomic context - use mempool. * Make mlx5_ib page fault handler run in process context. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d0cc6ed |
|
02-Jan-2017 |
Artemy Kovalyov <artemyko@mellanox.com> |
IB/mlx5: Add MR cache for large UMR regions In this change we turn mlx5_ib_update_mtt() into generic mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions supporting both atomic and process contexts and not limited by number of modified entries. Using this function we increase preallocated MRs up to 16GB. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d151d73d |
|
28-Dec-2016 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Avoid shadowing numa_node Avoid using a local variable named numa_node to avoid shadowing a public one. Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
689a248d |
|
28-Dec-2016 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Cancel recovery work in remove flow If there is pending delayed work for health recovery it must be canceled if the device is being unloaded. Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health work") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
883371c4 |
|
28-Dec-2016 |
Noa Osherovich <noaos@mellanox.com> |
net/mlx5: Check FW limitations on log_max_qp before setting it When setting HCA capabilities, set log_max_qp to be the minimum between the selected profile's value and the HCA limitation. Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5a1d1c2 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
|
#
f9c14e46 |
|
06-Dec-2016 |
Kamal Heib <kamalh@mellanox.com> |
net/mlx5: Fix query ISSI flow In old FWs query ISSI command is not supported and for some of those FWs it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR". In such case instead of failing the driver load, we will treat any FW status other than 0 for Query ISSI FW command as ISSI not supported and assume ISSI=0 (most basic driver/FW interface). In case of driver syndrom (query ISSI failure by driver) we will fail driver load. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f663ad98 |
|
06-Dec-2016 |
Kamal Heib <kamalh@mellanox.com> |
net/mlx5: Verify module parameters Verify the mlx5_core module parameters by making sure that they are in the expected range and if they aren't restore them to their default values. Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e7dfeb7 |
|
24-Nov-2016 |
Geliang Tang <geliangtang@gmail.com> |
net/mlx5: drop duplicate header delay.h Drop duplicate header delay.h from mlx5/core/main.c. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Matan Barak <matanb@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bffaa916 |
|
22-Nov-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for inline mode Implement devlink show and set of HW inline-mode. The supported modes: none, link, network, transport. We currently support one mode for all vports so set is done on all vports. When eswitch is first initialized the inline-mode is queried from the FW. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
012e50e1 |
|
17-Nov-2016 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Set driver version into firmware If driver_version capability bit is enabled, set driver version to firmware after the init HCA command, for display purposes. Example of driver version: "Linux,mlx5_core,3.0-1" Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e97a340 |
|
03-Nov-2016 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Fix invalid pointer reference when prof_sel parameter is invalid When prof_sel is invalid, mlx5_core_warn is called but the mlx5_core_dev is not initialized yet. Solution is moving the prof_sel code after dev->pdev assignment Fixes: 2974ab6e8bd8 ('net/mlx5: Improve driver log messages') Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86490d9a |
|
18-Oct-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the device ID "0x101a" to mlx5_core_pci_table. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
04c0c1ab |
|
25-Oct-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: PCI error recovery health care simulation In case that the kernel PCI error handlers are not called, we will trigger our own recovery flow. The health work will give priority to the kernel pci error handlers to recover the PCI by waiting for a small period, if the pci error handlers are not triggered the manual recovery flow will be executed. We don't save pci state in case of manual recovery because it will ruin the pci configuration space and we will lose dma sync. Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05ac2c0b |
|
25-Oct-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Fix race between PCI error handlers and health work Currently there is a race between the health care work and the kernel pci error handlers because both of them detect the error, the first one to be called will do the error handling. There is a chance that health care will disable the pci after resuming pci slot. Also create a separate WQ because now we will have two types of health works, one for the error detection and one for the recovery. Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bba1574c |
|
25-Oct-2016 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Always Query HCA caps after setting them Always query the HCA caps after setting them to update the capablities data structures. Not doing so results in incorrect capabilities being reported including max_dc, max_qp and several others. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1ee87fe |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Organize device list API in one place Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose an organized modular API to manage and manipulate mlx5 devices list. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2d6e31a |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Align sriov/eswitch modules with the new load/unload flow. Init/cleanup sriov/eswitch in the core software context init/cleanup flows. Attach/detach sriov/eswitch in the core load/unload flows. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59211bd3 |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Split the load/unload flow into hardware and software flows Gather all software context creating/destroying in one function and call it once in the first load and in the last unload. load/unload functions will now receive indication if we need to create/destroy the software contexts. In internal/pci error do the unload/load flows without releasing the software objects. In this way we perserve the sw core state and it help us restoring old driver state after PCI error/shutdown. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
737a234b |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Introduce attach/detach to interface API Add attach/detach callbacks to interface API. This is crucial for implementing seamless reset flow which releases the hardware and it's resources upon detach while keeping software structures and state (e.g netdev) then reset and reallocate the hardware needed resources upon attach. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b6adee3 |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: SRIOV core code refactoring Simplify the code and makes it look modular and symmetric. Split sriov enable/disable to two levels: device level and pci level. When user enable/disable sriov (via sriov_configure driver callback) we will enable/disable both device and pci sriov. When driver load/unload we will enable/disable (on demand) only device sriov while keeping the PCI sriov enabled for next driver load. On internal/pci error, VFs will be kept enabled on PCI and the reset is done only in device level. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1061c90f |
|
18-Aug-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Fix pci error recovery flow When PCI error is detected we should save the state of the pci prior to disabling it. Also when receiving pci slot reset call we need to verify that the device is responsive. Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
917b41aa |
|
30-May-2016 |
Aviv Heller <avivh@mellanox.com> |
net/mlx5: Configure IB devices according to LAG state When mlx5_ib is loaded, we would like each card's IB devices to be added according to its LAG state (one IB device, instead of two, is to be added if LAG is active). Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
7907f23a |
|
17-Apr-2016 |
Aviv Heller <avivh@mellanox.com> |
net/mlx5: Implement RoCE LAG feature Available on dual port cards only, this feature keeps track, using netdev LAG events, of the bonding and link status of each port's PF netdev. When both of the card's PF netdevs are enslaved to the same bond/team master, and only them, LAG state is active. During LAG, only one IB device is present for both ports. In addition to the above, this commit includes FW commands used for managing the LAG, new facilities for adding and removing a single device by interface, and port remap functionality according to bond events. Please note that this feature is currently used only for mimicking Ethernet bonding for RoCE - netdevs functionality is not altered, and their bonding continues to be managed solely by bond/team driver. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
2974ab6e |
|
28-Jul-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Improve driver log messages Remove duplicate pci dev name printing in mlx5_core_err. Use mlx5_core_{warn,info,err} where possible to have the pci info in the driver log messages. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
c4f287c4 |
|
19-Jul-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Unify and improve command interface Now as all commands use mlx5 ifc interface, instead of doing two calls for executing a command we embed command status checking into mlx5_cmd_exec to simplify the interface. Also we do here some cleanup for redundant software structures (inbox/outbox) and functions and improved command failure output. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
feae9087 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Add devlink interface The devlink interface is initially used to set/get the mode of the SRIOV e-switch. Currently, these are only stubs for get/set, down-stream patch will actually fill them out. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d57847dc |
|
30-Jun-2016 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx5: Fix wait_vital for VFs and remove fixed sleep The device ID for VFs is in a different location than PFs. This results in the poll always timing out for VFs. There's no good way to read the VF device ID without using the PF's configuration space. Switch to waiting for the health poll to start incrementing. Also remove the 1s sleep at the beginning. fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7092fe86 |
|
26-Jun-2016 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices Add the upcoming ConnectX-5 PCIe 4.0 device to the list of supported devices by the mlx5 driver. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1466cc5b |
|
23-Jun-2016 |
Yevgeny Petrilin <yevgenyp@mellanox.com> |
net/mlx5: Rate limit tables support Configuring and managing HW rate limit tables. The HW holds a table of rate limits, each rate is associated with an index in that table. Later a Send Queue uses this index to set the rate limit. Multiple Send Queues can have the same rate limit, which is represented by a single entry in this table. Even though a rate can be shared, each queue is being rate limited independently of others. The SW shadow of this table holds the rate itself, the index in the HW table and the refcount (number of queues) working with this rate. The exported functions are mlx5_rl_add_rate and mlx5_rl_remove_rate. Number of different rates and their values are derived from HW capabilities. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94c6825e |
|
17-Apr-2016 |
Matan Barak <matanb@mellanox.com> |
net/mlx5_core: Use tasklet for user-space CQ completion events Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx5 Ethernet napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5a7b27eb |
|
28-Apr-2016 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Initializing CPU reverse mapping Allocating CPU rmap and add entry for each IRQ. CPU rmap is used in aRFS to get the RX queue number of the RX completion interrupts. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5fc7197d |
|
21-Apr-2016 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: Add pci shutdown callback This patch introduces kexec support for mlx5. When switching kernels, kexec() calls shutdown, which unloads the driver and cleans its resources. In addition, remove unregister netdev from shutdown flow. This will allow a clean shutdown, even if some netdev clients did not release their reference from this netdev. Releasing The HW resources only is enough as the kernel is shutting down Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64dbbdfe |
|
21-Apr-2016 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5_core: Add ConnectX-5 to list of supported devices Add the upcoming ConnectX-5 devices (PF and VF) to the list of supported devices by the mlx5 driver. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b06e7de8 |
|
23-Feb-2016 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5_core: Refactor device capability function Device capability function was called similar in all places. It was called twice for every queried parameter, while the difference between calls was in HCA capability mode only. The change proposed unify these calls into one function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
91d9ed84 |
|
23-Feb-2016 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5_core: Fix caching ATOMIC endian mode capability Add caching of maximum device capability of ATOMIC endian mode. Fixes: f91e6d8941bf ('net/mlx5_core: Add setting ATOMIC endian mode') Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0ba42241 |
|
01-Mar-2016 |
Moshe Lazer <moshel@mellanox.com> |
net/mlx5: Fix global UAR mapping Avoid double mapping of io mapped memory, Device page may be mapped to non-cached(NC) or to write-combining(WC). The code before this fix tries to map it both to WC and NC contrary to what stated in Intel's software developer manual. Here we remove the global WC mapping of all UARS "dev->priv.bf_mapping", since UAR mapping should be decided per UAR (e.g we want different mappings for EQs, CQs vs QPs). Caller will now have to choose whether to map via write-combining API or not. mlx5e SQs will choose write-combining in order to perform BlueFlame writes. Fixes: 88a85f99e51f ('TX latency optimization to save DMA reads') Signed-off-by: Moshe Lazer <moshel@mellanox.com> Reviewed-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a606b0f6 |
|
29-Feb-2016 |
Matan Barak <matanb@mellanox.com> |
net/mlx5: Refactor mlx5_core_mr to mkey Mlx5's mkey mechanism is also used for memory windows. The current code base uses MR (memory region) naming, which is inaccurate. Changing MR to mkey in order to represent its different usages more accurately. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cfb5e088 |
|
14-Jan-2016 |
Haggai Abramovsky <hagaya@mellanox.com> |
IB/mlx5: Add CQE version 1 support to user QPs and SRQs Enforce working with CQE version 1 when the user supports CQE version 1 and asked to work this way. If the user still works with CQE version 0, then use the default CQE version to tell the Firmware that the user still works in the older mode. After this patch, the kernel still reports CQE version 0. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0b6e26ce |
|
17-Jan-2016 |
Doron Tsur <doront@mellanox.com> |
net/mlx5_core: Fix trimming down IRQ number With several ConnectX-4 cards installed on a server, one may receive irqn > 255 from the kernel API, which we mistakenly trim to 8bit. This causes EQ creation failure with the following stack trace: [<ffffffff812a11f4>] dump_stack+0x48/0x64 [<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0 [<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180 [<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core] [<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core] [<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core] [<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core] [<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core] [<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0 Fixing it by changing of the irqn type from u8 to unsigned int to support values > 255 Fixes: 61d0e73e0a5a ('net/mlx5_core: Use the the real irqn in eq->irqn') Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0844444 |
|
29-Dec-2015 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5_core: Introduce access function to read internal timer A preparation step which adds support for reading the hardware internal timer and the hardware timestamping from the CQE. In addition, advertize device_frequency_khz HCA capability. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f91e6d89 |
|
14-Dec-2015 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5_core: Add setting ATOMIC endian mode HW is capable of 2 requestor endianness modes for standard 8 Bytes atomic: BE (0x0) and host endianness (0x1). Read the supported modes from hca atomic capabilities and configure HW to host endianness mode if supported. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
86d722ad |
|
10-Dec-2015 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Use flow steering infrastructure for mlx5_en Expose the new flow steering API and remove the old one. Few changes are required: 1. The Ethernet flow steering follows the existing implementation, but uses the new steering API. The old flow steering implementation is removed. 2. Move the E-switch FDB management to use the new API. 3. When driver is loaded call to mlx5_init_fs which initialize the flow steering tree structure, open namespaces for NIC receive and for E-switch FDB. 4. Call to mlx5_cleanup_fs when the driver is unloaded. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
073bb189 |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Introducing E-Switch and l2 table E-Switch is the software entity that represents and manages ConnectX4 inter-HCA ethernet l2 switching. E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be connected to the device through a vport of an e-switch. Each e-switch is managed by one vNIC identified by HCA_CAP.vport_group_manager (usually it is the PF/vport[0]), and its main responsibility is to forward each packet to the right vport. e-Switch needs to manage its own l2-table and FDB tables. L2 table is a flow table that is managed by FW, it is needed for Multi-host (Multi PF) configuration for inter HCA switching between PFs. FDB table is a flow table that is totally managed by e-Switch driver, its main responsibility is to switch packets between e-Swtich internal vports and uplink vport that belong to the same. This patch introduces only e-Swtich l2 table management, FDB managemnt will come later when ethernet SRIOV/VFs will be enabled. preperation for ethernet sriov and l2 table management. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc50db98 |
|
01-Dec-2015 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Add base sriov support This patch adds SRIOV base support for mlx5 supported devices. The same driver is used for both PFs and VFs; VFs are identified by the driver through the flag MLX5_PCI_DEV_IS_VF added to the pci table entries. Virtual functions are created as usual through writing a value to the sriov_numvs sysfs file of the PF device. Upon instantiating VFs, they will all be probed by the driver on the hypervisor. One can gracefully unbind them through /sys/bus/pci/drivers/mlx5_core/unbind. mlx5_wait_for_vf_pages() was added to ensure that when a VF dies without executing proper teardown, the hypervisor driver waits till all of the pages that were allocated at the hypervisor to maintain its operation are returned. In order for the VF to be operational, the PF needs to call enable_hca for it. This can be done before the VFs are created through a call to pci_enable_sriov. If the there are VFs assigned to a VMs when the driver of the PF is unloaded, all the VF will experience system error and PF driver unloads cleanly; in this case pci_disable_sriov is not called and the devices will show when running lspci. Once the PF driver is reloaded, it will sync its data structures which maintain state on its VFs. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b107106 |
|
01-Dec-2015 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Modify enable/disable hca functions Modify these functions to have func_id argument to state which device we are referring to. This is done as a preparation for SRIOV support where a PF driver needs to control its virtual functions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3297246 |
|
14-Oct-2015 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Wait for FW readiness on startup On device initialization, wait till firmware indicates that that it is done with initialization before proceeding to initialize the device. Also update initialization segment layout to match driver/firmware interface definitions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89d44f0a |
|
14-Oct-2015 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5_core: Add pci error handlers to mlx5_core driver This patch implement the pci_error_handlers for mlx5_core which allow the driver to recover from PCI error. Once an error is detected in the PCI, the mlx5_pci_err_detected is called and it: 1) Marks the device to be in 'Internal Error' state. 2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes with error. 3) Returns all the on going commands with error. 4) Unloads the driver. Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it enables the device and restore it's pci state. If the later succeeds, mlx5_pci_resume is called, and it loads the SW stack. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
adec640e |
|
28-Aug-2015 |
Christoph Hellwig <hch@lst.de> |
mlx5: stop including <asm-generic/kmap_types.h> <linux/highmem.h> is the placace the get the kmap type flags, asm-generic files are generic implementations only to be used by architecture code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
ac6ea6e8 |
|
08-Oct-2015 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Use private health thread for each device Use a single threaded work queue for each device in the system instead of using one thread for any device. This is required so we can concurrently process system error handling for all the devices that need that. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a31208b1 |
|
25-Sep-2015 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5_core: New init and exit flow for mlx5_core In the new flow, we separate the pci initialization and teardown from the initialization and teardown of the other resources. init_one calls mlx5_pci_init that handles the pci resources initialization. It then calls mlx5_load_one to initialize the remainder of the resources. When removing a device, remove_one is invoked. However, now remove_one calls mlx5_unload_one to free all the resources except the pci resources. When mlx5_unload_one returns, mlx5_pci_close is called to free the pci resources. The above separation will allow us to implement the pci error handlers and suspend and resume callbacks. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe1e1876 |
|
05-Aug-2015 |
Carol L Soto <clsoto@linux.vnet.ibm.com> |
net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture failed to configure the page size for architectures with page size different than 4K. Fixes: 938fe83 ("net/mlx5_core: New device capabilities handling") Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88a85f99 |
|
23-Jul-2015 |
Achiad Shochat <achiad@mellanox.com> |
net/mlx5e: TX latency optimization to save DMA reads A regular TX WQE execution involves two or more DMA reads - one to fetch the WQE, and another one per WQE gather entry. These DMA reads obviously increase the TX latency. There are two mlx5 mechanisms to bypass these DMA reads: 1) Inline WQE 2) Blue Flame (BF) An inline WQE contains a whole packet, thus saves the DMA read/s of the regular WQE gather entry/s. Inline WQE support was already added in the previous commit. A BF WQE is written directly to the device I/O mapped memory, thus enables saving the DMA read that fetches the WQE. The BF WQE I/O write must be in cache line granularity, thus uses the CPU write combining mechanism. A BF WQE I/O write acts also as a TX doorbell for notifying the device of new TX WQEs. A BF WQE is written to the same I/O mapped address as the regular TX doorbell, thus this address is being mapped twice - once by ioremap() and once by io_mapping_map_wc(). While both mechanisms reduce the TX latency, they both consume more CPU cycles than a regular WQE: - A BF WQE must still be written to host memory, in addition to being written directly to the device I/O mapped memory. - An inline WQE involves copying the SKB data into it. To handle this tradeoff, we introduce here a heuristic algorithm that strives to avoid using these two mechanisms in case the TX queue is being back-pressured by the device, and limit their usage rate otherwise. An inline WQE will always be "Blue Flamed" (written directly to the device I/O mapped memory) while a BF WQE may not be inlined (may contain gather entries). Preliminary testing using netperf UDP_RR shows that the latency goes down from 17.5us to 16.9us, while the message rate (tested with pktgen) stays the same. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
311c7c71 |
|
23-Jul-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Allocate DMA coherent memory on reader NUMA node By affinity hints and XPS, each mlx5e channel is assigned a CPU core. Channel DMA coherent memory that is written by the NIC and read by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU core assigned for the channel. Channel DMA coherent memory that is written by SW and read by the NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC. Doorbell record (written by SW and read by the NIC) is an exception since it is accessed by SW more frequently. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
211e6c80 |
|
04-Jun-2015 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5_core: Get vendor-id using the query adapter command Add two wrapper functions to the query adapter command: 1. mlx5_query_board_id -- replaces the old mlx5_cmd_query_adapter. 2. mlx5_core_query_vendor_id -- retrieves the vendor_id from the query_adapter command. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
707c4602 |
|
04-Jun-2015 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5_core: Add new query HCA vport commands Added the implementation for the following commands: 1. QUERY_HCA_VPORT_GID 2. QUERY_HCA_VPORT_PKEY 3. QUERY_HCA_VPORT_CONTEXT They will be needed when we move to work with ISSI > 0 in the IB driver too. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e74a1db0 |
|
04-Jun-2015 |
Haggai Abramonvsky <hagaya@mellanox.com> |
net/mlx5_core: Check the return bitmask when querying ISSI The determination of the supported ISSI versions should be conditioned on the returned mask, and not only on the return status of the query ISSI command, fix that. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f62b8bb8 |
|
28-May-2015 |
Amir Vadai <amirv@mellanox.com> |
net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4 Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver extends the existing mlx5 driver with Ethernet functionality. This patch contains the driver entry points but does not include transmit and receive (see the previous patch in the series) routines. It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the Ethernet functionality. Currently, Kconfig is programmed to make Ethernet and Infiniband functionality mutally exclusive. Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND being selected. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
938fe83c |
|
28-May-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: New device capabilities handling - Query all supported types of dev caps on driver load. - Store the Cap data outbox per cap type into driver private data. - Introduce new Macros to access/dump stored caps (using the auto generated data types). - Obsolete SW representation of dev caps (no need for SW copy for each cap). - Modify IB driver to use new macros for checking caps. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e281682b |
|
28-May-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: HW data structs/types definitions cleanup mlx5_ifc.h was heavily modified here since it is now generated by a script from the device specification (PRM rev 0.25). This specification is backward compatible to existing hardware. Some structures/fields were added here in order to enable the Ethernet functionality of the driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db058a18 |
|
28-May-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: Set irq affinity hints Preparation for upcoming ethernet driver. - Move msix array from eq_table struct to priv since its not related to eq_table - Intorduce irq_info struct to hold all irq information - Move name from mlx5_eq to irq_info struct since it is irq property. - Set IRQ affinity hints Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64613d94 |
|
02-Apr-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: Extend struct mlx5_interface to support multiple protocols Preparation for ethernet driver. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
233d05d2 |
|
02-Apr-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: Move completion eqs from mlx5_ib to mlx5_core Preparation for ethernet driver. These functions will be used in drivers other than mlx5_ib. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ae6c18c |
|
02-Apr-2015 |
Achiad Shochat <achiad@mellanox.com> |
net/mlx5_core: Update module info macros for ConnectX4 Support Declare the support of new ConnectX4 HCA in module info Pump up the module version Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
302bdf68 |
|
02-Apr-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5_core: Fix Mellanox copyright note Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ef9baa2 |
|
02-Apr-2015 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Avoid setting DC requestor/responder resources PRM does not support setting these values so avoid setting them. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b9f53bc |
|
11-Mar-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
net/mlx5_core: don't export static symbol The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ type T; identifier f; @@ static T f (...) { ... } @@ identifier r.f; declarer name EXPORT_SYMBOL; @@ -EXPORT_SYMBOL(f); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de61390c |
|
11-Feb-2015 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Fix configuration of log_uar_page_sz The current code failed to configure the page size for architectures with page size different than 4K - PPC for example. Signed-off-by: Carol L Soto <clsoto@us.ibm.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c755cc5 |
|
03-Feb-2015 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5_core: Move to use hex PCI device IDs Align the IDs in the code with the modinfo, lspci -n, etc tools outputs. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28c167fa |
|
01-Dec-2014 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Add more supported devices Add ConnectX-4LX to the list of supported devices as well as their virtual functions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a9e161a |
|
01-Dec-2014 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Fix min vectors value in mlx5_enable_msix mlx5 requires at least one interrupt vector for completions so fix the minvec argument to pci_enable_msix_range() accordingly. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f66f049f |
|
01-Dec-2014 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Request the mlx5 IB module on driver load Call request module on mlx5_ib so it will be available for applications requiring it, such as installers that require boot over IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
364d1798 |
|
05-Nov-2014 |
Eli Cohen <eli@dev.mellanox.co.il> |
net/mlx5_core: Fix race on driver load When events arrive at driver load, the event handler gets called even before the spinlock and list are initialized. Fix this by moving the initialization before EQs creation. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f832dc82 |
|
01-Oct-2014 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Add ConnectX-4 to list of supported devices Add the upcoming ConnectX-4 device to the list of supported devices by then mlx5 driver. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b775516b |
|
01-Oct-2014 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: use set/get macros in device caps Transform device capabilities related commands to use set/get macros to manipulate command mailboxes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7a08ac7 |
|
01-Oct-2014 |
Eli Cohen <eli@mellanox.com> |
net/mlx5_core: Update device capabilities handling Rearrange struct mlx5_caps so it has a "gen" field to represent the current capabilities configured for the device. Max capabilities can also be queried from the device. Also update capabilities struct to contain more fields as per the latest revision if firmware specification. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d2f9bbb |
|
28-Jul-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx5: Adjust events to use unsigned long param instead of void * In the event flow, we currently pass only a port number in the void *data argument. Rather than pass a pointer to the event handlers, we should use an "unsigned long" parameter, and pass the port number value directly. In the future, if necessary for some events, we can use the unsigned long parameter to pass a pointer. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f241e749 |
|
28-Jul-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx5: minor fixes (mainly avoidance of hidden casts) There were many places where parameters which should be u8/u16 were integer type. Additionally, in 2 places, a check for a non-null pointer was added before dereferencing the pointer (this is actually a bug fix). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9603b61d |
|
28-Jul-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx5: Move pci device handling from mlx5_ib to mlx5_core In preparation for a new mlx5 device which is VPI (i.e., ports can be either IB or ETH), move the pci device functionality from mlx5_ib to mlx5_core. This involves the following changes: 1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev is now an independent structure maintained by mlx5_core. mlx5_ib_dev now has a pointer to that struct. This requires changing a lot of places where the core_dev struct was accessed via mlx5_ib_dev (now, this needs to be a pointer dereference). 2. All PCI initializations are now done in mlx5_core. Thus, it is now mlx5_core which does pci_register_device (and not mlx5_ib, as was previously). 3. mlx5_ib now registers itself with mlx5_core as an "interface" driver. This is very similar to the mechanism employed for the mlx4 (ConnectX) driver. Once the HCA is initialized (by mlx5_core), it invokes the interface drivers to do their initializations. 4. There is a new event handler which the core registers: mlx5_core_event(). This event handler invokes the event handlers registered by the interfaces. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a91de28 |
|
07-May-2014 |
Joe Perches <joe@perches.com> |
mellanox: Logging message cleanups Use a more current logging style. o Coalesce formats o Add missing spaces for coalesced formats o Align arguments for modified formats o Add missing newlines for some logging messages o Use DRV_NAME as part of format instead of %s, DRV_NAME to reduce overall text. o Use ..., ##__VA_ARGS__ instead of args... in macros o Correct a few format typos o Use a single line message where appropriate Signed-off-by: Joe Perches <joe@perches.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c120e9e0 |
|
07-Mar-2014 |
Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> |
IB/mlx5_core: remove unreachable function call in module init The call to mlx5_health_cleanup() in the module init function can never be reached. Removing it. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3bcdb17a |
|
23-Feb-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/mlx5: Keep mlx5 MRs in a radix tree under device This will be useful when processing signature errors on a specific key. The mlx5 driver will lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
169a1d85 |
|
19-Feb-2014 |
Amir Vadai <amirv@mellanox.com> |
net,IB/mlx: Bump all Mellanox driver versions Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3c9407b |
|
18-Feb-2014 |
Alexander Gordeev <agordeev@redhat.com> |
mlx5: Use pci_enable_msix_range() instead of pci_enable_msix() As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() and pci_enable_msix_range() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Eli Cohen <eli@mellanox.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1bde6e30 |
|
14-Jan-2014 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx5: Abort driver cleanup if teardown hca fails Do not continue with cleanup flow. If this ever happens we can check which resources remained open. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
87b8de49 |
|
23-Oct-2013 |
Eli Cohen <eli@dev.mellanox.co.il> |
mlx5: Clear reserved area in set_hca_cap() Firmware spec requires reserved fields to be cleared when calling set_hca_cap. Current code queries and copy to the set area, possibly resulting in reserved bits not cleared. This patch copies only writable fields to the set area. Fix also typo - msx => max Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c1868b82 |
|
11-Sep-2013 |
Eli Cohen <eli@mellanox.com> |
mlx5: Remove checksum on command interface commands Checksum calculations consume CPU resources and can be significant to the rate of resource creation/destruction. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
cd23b14b |
|
18-Jul-2013 |
Eli Cohen <eli@dev.mellanox.co.il> |
mlx5_core: Implement new initialization sequence Introduce enbale_hca and disable_hca commands to signify when the driver starts or ceases to operate on the device. In addition the driver will use boot and init pages count; boot pages is required to allow firmware to complete boot commands and the other to complete init hca. Command interface revision is bumped to 4 to enforce using supported firmware. This patch breaks compatibility with old versions of firmware (< 4); however, the first GA firmware we will publish will support version 4 so this should not be a problem. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
288dde9f |
|
10-Jul-2013 |
Moshe Lazer <moshel@mellanox.com> |
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec Sparse reported an endianness bug in the assignment to hca_cap.uar_page_sz. Fix the declaration of this field to be __be16 (which is what is in the firmware spec), renaming the field to log_uar_pg_size to conform to the spec, which fixes the endianness bug reported by sparse. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e126ba97 |
|
07-Jul-2013 |
Eli Cohen <eli@mellanox.com> |
mlx5: Add driver for Mellanox Connect-IB adapters The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4, except that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core is essentially a library that provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
|